We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics

New submissions

[ total of 66 entries: 1-66 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Thu, 2 May 24

[1]  arXiv:2405.00118 [pdf, other]
Title: Causal Inference with High-dimensional Discrete Covariates
Comments: 64 pages, 5 figures
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

When estimating causal effects from observational studies, researchers often need to adjust for many covariates to deconfound the non-causal relationship between exposure and outcome, among which many covariates are discrete. The behavior of commonly used estimators in the presence of many discrete covariates is not well understood since their properties are often analyzed under structural assumptions including sparsity and smoothness, which do not apply in discrete settings. In this work, we study the estimation of causal effects in a model where the covariates required for confounding adjustment are discrete but high-dimensional, meaning the number of categories $d$ is comparable with or even larger than sample size $n$. Specifically, we show the mean squared error of commonly used regression, weighting and doubly robust estimators is bounded by $\frac{d^2}{n^2}+\frac{1}{n}$. We then prove the minimax lower bound for the average treatment effect is of order $\frac{d^2}{n^2 \log^2 n}+\frac{1}{n}$, which characterizes the fundamental difficulty of causal effect estimation in the high-dimensional discrete setting, and shows the estimators mentioned above are rate-optimal up to log-factors. We further consider additional structures that can be exploited, namely effect homogeneity and prior knowledge of the covariate distribution, and propose new estimators that enjoy faster convergence rates of order $\frac{d}{n^2} + \frac{1}{n}$, which achieve consistency in a broader regime. The results are illustrated empirically via simulation studies.

[2]  arXiv:2405.00158 [pdf, other]
Title: BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking and Hierarchical Stacking in Python
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

Averaging predictions from multiple competing inferential models frequently outperforms predictions from any single model, providing that models are optimally weighted to maximize predictive performance. This is particularly the case in so-called $\mathcal{M}$-open settings where the true model is not in the set of candidate models, and may be neither mathematically reifiable nor known precisely. This practice of model averaging has a rich history in statistics and machine learning, and there are currently a number of methods to estimate the weights for constructing model-averaged predictive distributions. Nonetheless, there are few existing software packages that can estimate model weights from the full variety of methods available, and none that blend model predictions into a coherent predictive distribution according to the estimated weights. In this paper, we introduce the BayesBlend Python package, which provides a user-friendly programming interface to estimate weights and blend multiple (Bayesian) models' predictive distributions. BayesBlend implements pseudo-Bayesian model averaging, stacking and, uniquely, hierarchical Bayesian stacking to estimate model weights. We demonstrate the usage of BayesBlend with examples of insurance loss modeling.

[3]  arXiv:2405.00179 [pdf, other]
Title: A Bayesian joint longitudinal-survival model with a latent stochastic process for intensive longitudinal data
Comments: Main text is 32 pages with 6 figures. Supplementary material is 21 pages
Subjects: Methodology (stat.ME)

The availability of mobile health (mHealth) technology has enabled increased collection of intensive longitudinal data (ILD). ILD have potential to capture rapid fluctuations in outcomes that may be associated with changes in the risk of an event. However, existing methods for jointly modeling longitudinal and event-time outcomes are not well-equipped to handle ILD due to the high computational cost. We propose a joint longitudinal and time-to-event model suitable for analyzing ILD. In this model, we summarize a multivariate longitudinal outcome as a smaller number of time-varying latent factors. These latent factors, which are modeled using an Ornstein-Uhlenbeck stochastic process, capture the risk of a time-to-event outcome in a parametric hazard model. We take a Bayesian approach to fit our joint model and conduct simulations to assess its performance. We use it to analyze data from an mHealth study of smoking cessation. We summarize the longitudinal self-reported intensity of nine emotions as the psychological states of positive and negative affect. These time-varying latent states capture the risk of the first smoking lapse after attempted quit. Understanding factors associated with smoking lapse is of keen interest to smoking cessation researchers.

[4]  arXiv:2405.00185 [pdf, ps, other]
Title: Finite-sample adjustments for comparing clustered adaptive interventions using data from a clustered SMART
Subjects: Methodology (stat.ME)

Adaptive interventions, aka dynamic treatment regimens, are sequences of pre-specified decision rules that guide the provision of treatment for an individual given information about their baseline and evolving needs, including in response to prior intervention. Clustered adaptive interventions (cAIs) extend this idea by guiding the provision of intervention at the level of clusters (e.g., clinics), but with the goal of improving outcomes at the level of individuals within the cluster (e.g., clinicians or patients within clinics). A clustered, sequential multiple-assignment randomized trials (cSMARTs) is a multistage, multilevel randomized trial design used to construct high-quality cAIs. In a cSMART, clusters are randomized at multiple intervention decision points; at each decision point, the randomization probability can depend on response to prior data. A challenge in cluster-randomized trials, including cSMARTs, is the deleterious effect of small samples of clusters on statistical inference, particularly via estimation of standard errors. \par This manuscript develops finite-sample adjustment (FSA) methods for making improved statistical inference about the causal effects of cAIs in a cSMART. The paper develops FSA methods that (i) scale variance estimators using a degree-of-freedom adjustment, (ii) reference a t distribution (instead of a normal), and (iii) employ a ``bias corrected" variance estimator. Method (iii) requires extensions that are unique to the analysis of cSMARTs. Extensive simulation experiments are used to test the performance of the methods. The methods are illustrated using the Adaptive School-based Implementation of CBT (ASIC) study, a cSMART designed to construct a cAI for improving the delivery of cognitive behavioral therapy (CBT) by school mental health professionals within high schools in Michigan.

[5]  arXiv:2405.00188 [pdf, other]
Title: A Revisit of the Optimal Excess-of-Loss Contract
Subjects: Applications (stat.AP); Theoretical Economics (econ.TH)

It is well-known that Excess-of-Loss reinsurance has more marketability than Stop-Loss reinsurance, though Stop-Loss reinsurance is the most prominent setting discussed in the optimal (re)insurance design literature. We point out that optimal reinsurance policy under Stop-Loss leads to a zero insolvency probability, which motivates our paper. We provide a remedy to this peculiar property of the optimal Stop-Loss reinsurance contract by investigating the optimal Excess-of-Loss reinsurance contract instead. We also provide estimators for the optimal Excess-of-Loss and Stop-Loss contracts and investigate their statistical properties under many premium principle assumptions and various risk preferences, which according to our knowledge, have never been investigated in the literature. Simulated data and real-life data are used to illustrate our main theoretical findings.

[6]  arXiv:2405.00294 [pdf, other]
Title: Conformal inference for random objects
Subjects: Methodology (stat.ME)

We develop an inferential toolkit for analyzing object-valued responses, which correspond to data situated in general metric spaces, paired with Euclidean predictors within the conformal framework. To this end we introduce conditional profile average transport costs, where we compare distance profiles that correspond to one-dimensional distributions of probability mass falling into balls of increasing radius through the optimal transport cost when moving from one distance profile to another. The average transport cost to transport a given distance profile to all others is crucial for statistical inference in metric spaces and underpins the proposed conditional profile scores. A key feature of the proposed approach is to utilize the distribution of conditional profile average transport costs as conformity score for general metric space-valued responses, which facilitates the construction of prediction sets by the split conformal algorithm. We derive the uniform convergence rate of the proposed conformity score estimators and establish asymptotic conditional validity for the prediction sets. The finite sample performance for synthetic data in various metric spaces demonstrates that the proposed conditional profile score outperforms existing methods in terms of both coverage level and size of the resulting prediction sets, even in the special case of scalar and thus Euclidean responses. We also demonstrate the practical utility of conditional profile scores for network data from New York taxi trips and for compositional data reflecting energy sourcing of U.S. states.

[7]  arXiv:2405.00364 [pdf, other]
Title: Object detection under the linear subspace model with application to cryo-EM images
Subjects: Statistics Theory (math.ST); Probability (math.PR)

Detecting multiple unknown objects in noisy data is a key problem in many scientific fields, such as electron microscopy imaging. A common model for the unknown objects is the linear subspace model, which assumes that the objects can be expanded in some known basis (such as the Fourier basis). In this paper, we develop an object detection algorithm that under the linear subspace model is asymptotically guaranteed to detect all objects, while controlling the family wise error rate or the false discovery rate. Numerical simulations show that the algorithm also controls the error rate with high power in the non-asymptotic regime, even in highly challenging regimes. We apply the proposed algorithm to experimental electron microscopy data set, and show that it outperforms existing standard software.

[8]  arXiv:2405.00385 [pdf, other]
Title: Variational Bayesian Methods for a Tree-Structured Stick-Breaking Process Mixture of Gaussians
Authors: Yuta Nakahara
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

The Bayes coding algorithm for context tree source is a successful example of Bayesian tree estimation in text compression in information theory. This algorithm provides an efficient parametric representation of the posterior tree distribution and exact updating of its parameters. We apply this algorithm to a clustering task in machine learning. More specifically, we apply it to Bayesian estimation of the tree-structured stick-breaking process (TS-SBP) mixture models. For TS-SBP mixture models, only Markov chain Monte Carlo methods have been proposed so far, but any variational Bayesian methods have not been proposed yet. In this paper, we propose a variational Bayesian method that has a subroutine similar to the Bayes coding algorithm for context tree sources. We confirm its behavior by a numerical experiment on a toy example.

[9]  arXiv:2405.00397 [pdf, other]
Title: Posterior exploration for computationally intensive forward models
Comments: To appear in the Handbook of Markov Chain Monte Carlo (2nd edition)
Subjects: Computation (stat.CO)

In this chapter, we address the challenge of exploring the posterior distributions of Bayesian inverse problems with computationally intensive forward models. We consider various multivariate proposal distributions, and compare them with single-site Metropolis updates. We show how fast, approximate models can be leveraged to improve the MCMC sampling efficiency.

[10]  arXiv:2405.00442 [pdf, other]
Title: Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration
Comments: This paper is under consideration at Pattern Recognition Letters
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The key factor in implementing machine learning algorithms in decision-making situations is not only the accuracy of the model but also its confidence level. The confidence level of a model in a classification problem is often given by the output vector of a softmax function for convenience. However, these values are known to deviate significantly from the actual expected model confidence. This problem is called model calibration and has been studied extensively. One of the simplest techniques to tackle this task is focal loss, a generalization of cross-entropy by introducing one positive parameter. Although many related studies exist because of the simplicity of the idea and its formalization, the theoretical analysis of its behavior is still insufficient. In this study, our objective is to understand the behavior of focal loss by reinterpreting this function geometrically. Our analysis suggests that focal loss reduces the curvature of the loss surface in training the model. This indicates that curvature may be one of the essential factors in achieving model calibration. We design numerical experiments to support this conjecture to reveal the behavior of focal loss and the relationship between calibration performance and curvature.

[11]  arXiv:2405.00535 [pdf, other]
Title: Bayesian Varying-Effects Vector Autoregressive Models for Inference of Brain Connectivity Networks and Covariate Effects in Pediatric Traumatic Brain Injury
Subjects: Methodology (stat.ME)

In this paper, we develop an analytical approach for estimating brain connectivity networks that accounts for subject heterogeneity. More specifically, we consider a novel extension of a multi-subject Bayesian vector autoregressive model that estimates group-specific directed brain connectivity networks and accounts for the effects of covariates on the network edges. We adopt a flexible approach, allowing for (possibly) non-linear effects of the covariates on edge strength via a novel Bayesian nonparametric prior that employs a weighted mixture of Gaussian processes. For posterior inference, we achieve computational scalability by implementing a variational Bayes scheme. Our approach enables simultaneous estimation of group-specific networks and selection of relevant covariate effects. We show improved performance over competing two-stage approaches on simulated data. We apply our method on resting-state fMRI data from children with a history of traumatic brain injury and healthy controls to estimate the effects of age and sex on the group-level connectivities. Our results highlight differences in the distribution of parent nodes. They also suggest alteration in the relation of age, with peak edge strength in children with traumatic brain injury (TBI), and differences in effective connectivity strength between males and females.

[12]  arXiv:2405.00581 [pdf, other]
Title: Conformalized Tensor Completion with Riemannian Optimization
Authors: Hu Sun, Yang Chen
Subjects: Methodology (stat.ME); Computation (stat.CO)

Tensor data, or multi-dimensional array, is a data format popular in multiple fields such as social network analysis, recommender systems, and brain imaging. It is not uncommon to observe tensor data containing missing values and tensor completion aims at estimating the missing values given the partially observed tensor. Sufficient efforts have been spared on devising scalable tensor completion algorithms but few on quantifying the uncertainty of the estimator. In this paper, we nest the uncertainty quantification (UQ) of tensor completion under a split conformal prediction framework and establish the connection of the UQ problem to a problem of estimating the missing propensity of each tensor entry. We model the data missingness of the tensor with a tensor Ising model parameterized by a low-rank tensor parameter. We propose to estimate the tensor parameter by maximum pseudo-likelihood estimation (MPLE) with a Riemannian gradient descent algorithm. Extensive simulation studies have been conducted to justify the validity of the resulting conformal interval. We apply our method to the regional total electron content (TEC) reconstruction problem.

[13]  arXiv:2405.00582 [pdf, ps, other]
Title: Implementing Bayesian inference on a stochastic CO2-based grey-box model for assessing indoor air quality in Canadian primary schools
Subjects: Applications (stat.AP)

The COVID-19 pandemic brought global attention to indoor air quality (IAQ), which is intrinsically linked to clean air change rates. Estimating the air change rate in indoor environments, however, remains challenging. It is primarily due to the uncertainties associated with the air change rate estimation, such as pollutant generation rates, dynamics including weather and occupancies, and the limitations of deterministic approaches to accommodate these factors. In this study, Bayesian inference was implemented on a stochastic CO2-based grey-box model to infer modeled parameters and quantify uncertainties. The accuracy and robustness of the ventilation rate and CO2 emission rate estimated by the model were confirmed with CO2 tracer gas experiments conducted in an airtight chamber. Both prior and posterior predictive checks (PPC) were performed to demonstrate the advantage of this approach. In addition, uncertainties in real-life contexts were quantified with an incremental variance {\sigma} for the Wiener process. This approach was later applied to evaluate the ventilation conditions within two primary school classrooms in Montreal. The Equivalent Clean Airflow Rate (ECAi) was calculated following ASHRAE 241, and an insufficient clean air supply within both classrooms was identified. A supplement of 800 cfm clear air delivery rate (CADR) from air-cleaning devices is recommended for a sufficient ECAi. Finally, steady-state CO2 thresholds (Climit, Ctarget, and Cideal) were carried out to indicate when ECAi requirements could be achieved under various mitigation strategies, such as portable air cleaners and in-room ultraviolet light, with CADR values ranging from 200 to 1000 cfm.

[14]  arXiv:2405.00592 [pdf, other]
Title: Scaling and renormalization in high-dimensional regression
Comments: 64 pages, 16 figures
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)

This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

[15]  arXiv:2405.00619 [pdf, other]
Title: One-Bit Total Variation Denoising over Networks with Applications to Partially Observed Epidemics
Subjects: Methodology (stat.ME)

This paper introduces a novel approach for epidemic nowcasting and forecasting over networks using total variation (TV) denoising, a method inspired by classical signal processing techniques. Considering a network that models a population as a set of $n$ nodes characterized by their infection statuses $Y_i$ and that represents contacts as edges, we prove the consistency of graph-TV denoising for estimating the underlying infection probabilities $\{p_i\}_{ i \in \{1,\cdots, n\}}$ in the presence of Bernoulli noise. Our results provide an important extension of existing bounds derived in the Gaussian case to the study of binary variables -- an approach hereafter referred to as one-bit total variation denoising. The methodology is further extended to handle incomplete observations, thereby expanding its relevance to various real-world situations where observations over the full graph may not be accessible. Focusing on the context of epidemics, we establish that one-bit total variation denoising enhances both nowcasting and forecasting accuracy in networks, as further evidenced by comprehensive numerical experiments and two real-world examples. The contributions of this paper lie in its theoretical developments, particularly in addressing the incomplete data case, thereby paving the way for more precise epidemic modelling and enhanced surveillance strategies in practical settings.

[16]  arXiv:2405.00626 [pdf, other]
Title: SARMA: Scalable Low-Rank High-Dimensional Autoregressive Moving Averages via Tensor Decomposition
Subjects: Methodology (stat.ME)

Existing models for high-dimensional time series are overwhelmingly developed within the finite-order vector autoregressive (VAR) framework, whereas the more flexible vector autoregressive moving averages (VARMA) have been much less considered. This paper introduces a high-dimensional model for capturing VARMA dynamics, namely the Scalable ARMA (SARMA) model, by combining novel reparameterization and tensor decomposition techniques. To ensure identifiability and computational tractability, we first consider a reparameterization of the VARMA model and discover that this interestingly amounts to a Tucker-low-rank structure for the AR coefficient tensor along the temporal dimension. Motivated by this finding, we further consider Tucker decomposition across the response and predictor dimensions of the AR coefficient tensor, enabling factor extraction across variables and time lags. Additionally, we consider sparsity assumptions on the factor loadings to accomplish automatic variable selection and greater estimation efficiency. For the proposed model, we develop both rank-constrained and sparsity-inducing estimators. Algorithms and model selection methods are also provided. Simulation studies and empirical examples confirm the validity of our theory and advantages of our approaches over existing competitors.

[17]  arXiv:2405.00642 [pdf, other]
Title: From Empirical Observations to Universality: Dynamics of Deep Learning with Inputs Built on Gaussian mixture
Comments: 19 pages, 9 figures
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)

This study broadens the scope of theoretical frameworks in deep learning by delving into the dynamics of neural networks with inputs that demonstrate the structural characteristics to Gaussian Mixture (GM). We analyzed how the dynamics of neural networks under GM-structured inputs diverge from the predictions of conventional theories based on simple Gaussian structures. A revelation of our work is the observed convergence of neural network dynamics towards conventional theory even with standardized GM inputs, highlighting an unexpected universality. We found that standardization, especially in conjunction with certain nonlinear functions, plays a critical role in this phenomena. Consequently, despite the complex and varied nature of GM distributions, we demonstrate that neural networks exhibit asymptotic behaviors in line with predictions under simple Gaussian frameworks.

Cross-lists for Thu, 2 May 24

[18]  arXiv:2405.00017 (cross-list from cs.DC) [pdf, other]
Title: Queuing dynamics of asynchronous Federated Learning
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)

We study asynchronous federated learning mechanisms with nodes having potentially different computational speeds. In such an environment, each node is allowed to work on models with potential delays and contribute to updates to the central server at its own pace. Existing analyses of such algorithms typically depend on intractable quantities such as the maximum node delay and do not consider the underlying queuing dynamics of the system. In this paper, we propose a non-uniform sampling scheme for the central server that allows for lower delays with better complexity, taking into account the closed Jackson network structure of the associated computational graph. Our experiments clearly show a significant improvement of our method over current state-of-the-art asynchronous algorithms on an image classification problem.

[19]  arXiv:2405.00058 (cross-list from math.OC) [pdf, ps, other]
Title: Gaussianity and the Kalman Filter: A Simple Yet Complicated Relationship
Journal-ref: Journal de Ciencia e Ingenier\'ia, vol. 14, no. 1, pp. 21-26, 2022
Subjects: Optimization and Control (math.OC); Statistics Theory (math.ST)

One of the most common misconceptions made about the Kalman filter when applied to linear systems is that it requires an assumption that all error and noise processes are Gaussian. This misconception has frequently led to the Kalman filter being dismissed in favor of complicated and/or purely heuristic approaches that are supposedly "more general" in that they can be applied to problems involving non-Gaussian noise. The fact is that the Kalman filter provides rigorous and optimal performance guarantees that do not rely on any distribution assumptions beyond mean and error covariance information. These guarantees even apply to use of the Kalman update formula when applied with nonlinear models, as long as its other required assumptions are satisfied. Here we discuss misconceptions about its generality that are often found and reinforced in the literature, especially outside the traditional fields of estimation and control.

[20]  arXiv:2405.00065 (cross-list from math.OC) [pdf, other]
Title: From Linear to Linearizable Optimization: A Novel Framework with Applications to Stationary and Non-stationary DR-submodular Optimization
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper introduces the notion of upper linearizable/quadratizable functions, a class that extends concavity and DR-submodularity in various settings, including monotone and non-monotone cases over different convex sets. A general meta-algorithm is devised to convert algorithms for linear/quadratic maximization into ones that optimize upper quadratizable functions, offering a unified approach to tackling concave and DR-submodular optimization problems. The paper extends these results to multiple feedback settings, facilitating conversions between semi-bandit/first-order feedback and bandit/zeroth-order feedback, as well as between first/zeroth-order feedback and semi-bandit/bandit feedback. Leveraging this framework, new projection-free algorithms are derived using Follow The Perturbed Leader (FTPL) and other algorithms as base algorithms for linear/convex optimization, improving upon state-of-the-art results in various cases. Dynamic and adaptive regret guarantees are obtained for DR-submodular maximization, marking the first algorithms to achieve such guarantees in these settings. Notably, the paper achieves these advancements with fewer assumptions compared to existing state-of-the-art results, underscoring its broad applicability and theoretical contributions to non-convex optimization.

[21]  arXiv:2405.00081 (cross-list from math.PR) [pdf, other]
Title: Imprecise Markov Semigroups and their Ergodicity
Authors: Michele Caprio
Subjects: Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)

We introduce the concept of imprecise Markov semigroup. It allows us to see Markov chains and processes with imprecise transition probabilities as (a collection of diffusion) operators, and thus to unlock techniques from geometry, functional analysis, and (high dimensional) probability to study their ergodic behavior. We show that, if the initial distribution of an imprecise Markov semigroup is known and invariant, under some conditions that also involve the geometry of the state space, eventually the ambiguity around the transition probability fades. We call this property ergodicity of the imprecise Markov semigroup, and we relate it to the classical (Birkhoff's) notion of ergodicity. We prove ergodicity both when the state space is Euclidean or a Riemannian manifold, and when it is an arbitrary measurable space. The importance of our findings for the fields of machine learning and computer vision is also discussed.

[22]  arXiv:2405.00129 (cross-list from cs.SI) [pdf, other]
Title: Complex contagions can outperform simple contagions for network reconstruction with dense networks or saturated dynamics
Comments: 8 pages, 5 figures
Subjects: Social and Information Networks (cs.SI); Populations and Evolution (q-bio.PE); Machine Learning (stat.ML)

Network scientists often use complex dynamic processes to describe network contagions, but tools for fitting contagion models typically assume simple dynamics. Here, we address this gap by developing a nonparametric method to reconstruct a network and dynamics from a series of node states, using a model that breaks the dichotomy between simple pairwise and complex neighborhood-based contagions. We then show that a network is more easily reconstructed when observed through the lens of complex contagions if it is dense or the dynamic saturates, and that simple contagions are better otherwise.

[23]  arXiv:2405.00161 (cross-list from econ.EM) [pdf, other]
Title: Estimating Heterogeneous Treatment Effects with Item-Level Outcome Data: Insights from Item Response Theory
Subjects: Econometrics (econ.EM); Methodology (stat.ME)

Analyses of heterogeneous treatment effects (HTE) are common in applied causal inference research. However, when outcomes are latent variables assessed via psychometric instruments such as educational tests, standard methods ignore the potential HTE that may exist among the individual items of the outcome measure. Failing to account for "item-level" HTE (IL-HTE) can lead to both estimated standard errors that are too small and identification challenges in the estimation of treatment-by-covariate interaction effects. We demonstrate how Item Response Theory (IRT) models that estimate a treatment effect for each assessment item can both address these challenges and provide new insights into HTE generally. This study articulates the theoretical rationale for the IL-HTE model and demonstrates its practical value using data from 20 randomized controlled trials in economics, education, and health. Our results show that the IL-HTE model reveals item-level variation masked by average treatment effects, provides more accurate statistical inference, allows for estimates of the generalizability of causal effects, resolves identification problems in the estimation of interaction effects, and provides estimates of standardized treatment effect sizes corrected for attenuation due to measurement error.

[24]  arXiv:2405.00172 (cross-list from cs.LG) [pdf, other]
Title: Re-visiting Skip-Gram Negative Sampling: Dimension Regularization for More Efficient Dissimilarity Preservation in Graph Embeddings
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

A wide range of graph embedding objectives decompose into two components: one that attracts the embeddings of nodes that are perceived as similar, and another that repels embeddings of nodes that are perceived as dissimilar. Because real-world graphs are sparse and the number of dissimilar pairs grows quadratically with the number of nodes, Skip-Gram Negative Sampling (SGNS) has emerged as a popular and efficient repulsion approach. SGNS repels each node from a sample of dissimilar nodes, as opposed to all dissimilar nodes. In this work, we show that node-wise repulsion is, in aggregate, an approximate re-centering of the node embedding dimensions. Such dimension operations are much more scalable than node operations. The dimension approach, in addition to being more efficient, yields a simpler geometric interpretation of the repulsion. Our result extends findings from the self-supervised learning literature to the skip-gram model, establishing a connection between skip-gram node contrast and dimension regularization. We show that in the limit of large graphs, under mild regularity conditions, the original node repulsion objective converges to optimization with dimension regularization. We use this observation to propose an algorithm augmentation framework that speeds up any existing algorithm, supervised or unsupervised, using SGNS. The framework prioritizes node attraction and replaces SGNS with dimension regularization. We instantiate this generic framework for LINE and node2vec and show that the augmented algorithms preserve downstream performance while dramatically increasing efficiency.

[25]  arXiv:2405.00190 (cross-list from quant-ph) [pdf, other]
Title: Distribution of lowest eigenvalue in $k$-body bosonic random matrix ensembles
Comments: 18 pages, 9 figures
Subjects: Quantum Physics (quant-ph); Data Analysis, Statistics and Probability (physics.data-an); Applications (stat.AP)

We numerically study the distribution of the lowest eigenvalue of finite many-boson systems with $k$-body interactions modeled by Bosonic Embedded Gaussian Orthogonal [BEGOE($k$)] and Unitary [BEGUE($k$)] random matrix Ensembles. Following the recently established result that the $q$-normal describes the smooth form of the eigenvalue density of the $k$-body embedded ensembles, the first four moments of the distribution of lowest eigenvalues have been analyzed as a function of the $q$ parameter, with $q \sim 1$ for $k = 1$ and $q = 0$ for $k = m$; $m$ being the number of bosons. Our results show the distribution exhibits a smooth transition from Gaussian like for $q$ close to 1 to a modified Gumbel like for intermediate values of $q$ to the well-known Tracy-Widom distribution for $q=0$.

[26]  arXiv:2405.00202 (cross-list from cs.LG) [pdf, other]
Title: Leveraging Active Subspaces to Capture Epistemic Model Uncertainty in Deep Generative Models for Molecular Design
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

Deep generative models have been accelerating the inverse design process in material and drug design. Unlike their counterpart property predictors in typical molecular design frameworks, generative molecular design models have seen fewer efforts on uncertainty quantification (UQ) due to computational challenges in Bayesian inference posed by their large number of parameters. In this work, we focus on the junction-tree variational autoencoder (JT-VAE), a popular model for generative molecular design, and address this issue by leveraging the low dimensional active subspace to capture the uncertainty in the model parameters. Specifically, we approximate the posterior distribution over the active subspace parameters to estimate the epistemic model uncertainty in an extremely high dimensional parameter space. The proposed UQ scheme does not require alteration of the model architecture, making it readily applicable to any pre-trained model. Our experiments demonstrate the efficacy of the AS-based UQ and its potential impact on molecular optimization by exploring the model diversity under epistemic uncertainty.

[27]  arXiv:2405.00308 (cross-list from cs.CR) [pdf, ps, other]
Title: FPGA Digital Dice using Pseudo Random Number Generator
Comments: 15 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Applications (stat.AP)

The goal of this project is to design a digital dice that displays dice numbers in real-time. The number is generated by a pseudo-random number generator (PRNG) using XORshift algorithm that is implemented in Verilog HDL on an FPGA. The digital dice is equipped with tilt sensor, display, power management circuit, and rechargeable battery hosted in a 3D printed dice casing. By shaking the digital dice, the tilt sensor signal produces a seed for the PRNG. This digital dice demonstrates a set of possible random numbers of 2, 4, 6, 8, 10, 12, 20, 100 that simulate the number of dice sides. The kit is named SUTDicey.

[28]  arXiv:2405.00333 (cross-list from q-bio.PE) [pdf, other]
Title: Reevaluating coexistence and stability in ecosystem networks to address ecological transients: methods and implications
Subjects: Populations and Evolution (q-bio.PE); Applications (stat.AP)

Representing ecosystems at equilibrium has been foundational for building ecological theories, forecasting species populations and planning conservation actions. The equilibrium "balance of nature" ideal suggests that populations will eventually stabilise to a coexisting balance of species. However, a growing body of literature argues that the equilibrium ideal is inappropriate for ecosystems. Here, we develop and demonstrate a new framework for representing ecosystems without considering equilibrium dynamics. Instead, far more pragmatic ecosystem models are constructed by considering population trajectories, regardless of whether they exhibit equilibrium or transient (i.e. non-equilibrium) behaviour. This novel framework maximally utilises readily available, but often overlooked, knowledge from field observations and expert elicitation, rather than relying on theoretical ecosystem properties. We developed innovative Bayesian algorithms to generate ecosystem models in this new statistical framework, without excessive computational burden. Our results reveal that our pragmatic framework could have a dramatic impact on conservation decision-making and enhance the realism of ecosystem models and forecasts.

[29]  arXiv:2405.00357 (cross-list from q-fin.RM) [pdf, other]
Title: Optimal nonparametric estimation of the expected shortfall risk
Subjects: Risk Management (q-fin.RM); Probability (math.PR); Statistics Theory (math.ST); Mathematical Finance (q-fin.MF)

We address the problem of estimating the expected shortfall risk of a financial loss using a finite number of i.i.d. data. It is well known that the classical plug-in estimator suffers from poor statistical performance when faced with (heavy-tailed) distributions that are commonly used in financial contexts. Further, it lacks robustness, as the modification of even a single data point can cause a significant distortion. We propose a novel procedure for the estimation of the expected shortfall and prove that it recovers the best possible statistical properties (dictated by the central limit theorem) under minimal assumptions and for all finite numbers of data. Further, this estimator is adversarially robust: even if a (small) proportion of the data is maliciously modified, the procedure continuous to optimally estimate the true expected shortfall risk. We demonstrate that our estimator outperforms the classical plug-in estimator through a variety of numerical experiments across a range of standard loss distributions.

[30]  arXiv:2405.00366 (cross-list from cs.ET) [pdf, ps, other]
Title: L0-regularized compressed sensing with Mean-field Coherent Ising Machines
Comments: 19 pages, 7 figures
Subjects: Emerging Technologies (cs.ET); Quantum Physics (quant-ph); Applications (stat.AP); Computation (stat.CO)

Coherent Ising Machine (CIM) is a network of optical parametric oscillators that solves combinatorial optimization problems by finding the ground state of an Ising Hamiltonian. As a practical application of CIM, Aonishi et al. proposed a quantum-classical hybrid system to solve optimization problems of L0-regularization-based compressed sensing (L0RBCS). Gunathilaka et al. has further enhanced the accuracy of the system. However, the computationally expensive CIM's stochastic differential equations (SDEs) limit the use of digital hardware implementations. As an alternative to Gunathilaka et al.'s CIM SDEs used previously, we propose using the mean-field CIM (MF-CIM) model, which is a physics-inspired heuristic solver without quantum noise. MF-CIM surmounts the high computational cost due to the simple nature of the differential equations (DEs). Furthermore, our results indicate that the proposed model has similar performance to physically accurate SDEs in both artificial and magnetic resonance imaging data, paving the way for implementing CIM-based L0RBCS on digital hardware such as Field Programmable Gate Arrays (FPGAs).

[31]  arXiv:2405.00417 (cross-list from cs.LG) [pdf, other]
Title: Conformal Risk Control for Ordinal Classification
Comments: 17 pages, 8 figures, 2 table; 1 supplementary page
Journal-ref: In UAI 2023: The 39th Conference on Uncertainty in Artificial Intelligence
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

As a natural extension to the standard conformal prediction method, several conformal risk control methods have been recently developed and applied to various learning problems. In this work, we seek to control the conformal risk in expectation for ordinal classification tasks, which have broad applications to many real problems. For this purpose, we firstly formulated the ordinal classification task in the conformal risk control framework, and provided theoretic risk bounds of the risk control method. Then we proposed two types of loss functions specially designed for ordinal classification tasks, and developed corresponding algorithms to determine the prediction set for each case to control their risks at a desired level. We demonstrated the effectiveness of our proposed methods, and analyzed the difference between the two types of risks on three different datasets, including a simulated dataset, the UTKFace dataset and the diabetic retinopathy detection dataset.

[32]  arXiv:2405.00424 (cross-list from econ.EM) [pdf, other]
Title: Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution
Authors: Zhaoxing Gao
Comments: 53 pages, 10 figures
Subjects: Econometrics (econ.EM); Methodology (stat.ME); Machine Learning (stat.ML)

Ridge regression is an indispensable tool in big data econometrics but suffers from bias issues affecting both statistical efficiency and scalability. We introduce an iterative strategy to correct the bias effectively when the dimension $p$ is less than the sample size $n$. For $p>n$, our method optimally reduces the bias to a level unachievable through linear transformations of the response. We employ a Ridge-Screening (RS) method to handle the remaining bias when $p>n$, creating a reduced model suitable for bias-correction. Under certain conditions, the selected model nests the true one, making RS a novel variable selection approach. We establish the asymptotic properties and valid inferences of our de-biased ridge estimators for both $p< n$ and $p>n$, where $p$ and $n$ may grow towards infinity, along with the number of iterations. Our method is validated using simulated and real-world data examples, providing a closed-form solution to bias challenges in ridge regression inferences.

[33]  arXiv:2405.00454 (cross-list from cs.LG) [pdf, ps, other]
Title: Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence
Comments: Accepted in ISIT 2024
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning. These approaches draw inspiration from various divergence measures, such as $f$-divergences and $\alpha$-R\'enyi divergences. Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $\alpha$-R\'enyi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques. In the pseudo-labeling and entropy minimization techniques as self-training methods for effective semi-supervised learning, the self-training process has some inherent mismatch between the true label and pseudo-label (noisy pseudo-labels) and some of our empirical risk functions are robust, concerning noisy pseudo-labels. Under some conditions, our empirical risk functions demonstrate better performance when compared to traditional self-training methods.

[34]  arXiv:2405.00489 (cross-list from cs.LG) [pdf, other]
Title: Explainable Automatic Grading with Neural Additive Models
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Applications (stat.AP)

The use of automatic short answer grading (ASAG) models may help alleviate the time burden of grading while encouraging educators to frequently incorporate open-ended items in their curriculum. However, current state-of-the-art ASAG models are large neural networks (NN) often described as "black box", providing no explanation for which characteristics of an input are important for the produced output. This inexplicable nature can be frustrating to teachers and students when trying to interpret, or learn from an automatically-generated grade. To create a powerful yet intelligible ASAG model, we experiment with a type of model called a Neural Additive Model that combines the performance of a NN with the explainability of an additive model. We use a Knowledge Integration (KI) framework from the learning sciences to guide feature engineering to create inputs that reflect whether a student includes certain ideas in their response. We hypothesize that indicating the inclusion (or exclusion) of predefined ideas as features will be sufficient for the NAM to have good predictive power and interpretability, as this may guide a human scorer using a KI rubric. We compare the performance of the NAM with another explainable model, logistic regression, using the same features, and to a non-explainable neural model, DeBERTa, that does not require feature engineering.

[35]  arXiv:2405.00576 (cross-list from q-fin.RM) [pdf, other]
Title: Calibration of the rating transition model for high and low default portfolios
Subjects: Risk Management (q-fin.RM); Methodology (stat.ME)

In this paper we develop Maximum likelihood (ML) based algorithms to calibrate the model parameters in credit rating transition models. Since the credit rating transition models are not Gaussian linear models, the celebrated Kalman filter is not suitable to compute the likelihood of observed migrations. Therefore, we develop a Laplace approximation of the likelihood function and as a result the Kalman filter can be used in the end to compute the likelihood function. This approach is applied to so-called high-default portfolios, in which the number of migrations (defaults) is large enough to obtain high accuracy of the Laplace approximation. By contrast, low-default portfolios have a limited number of observed migrations (defaults). Therefore, in order to calibrate low-default portfolios, we develop a ML algorithm using a particle filter (PF) and Gaussian process regression. Experiments show that both algorithms are efficient and produce accurate approximations of the likelihood function and the ML estimates of the model parameters.

[36]  arXiv:2405.00669 (cross-list from astro-ph.CO) [pdf, other]
Title: Euclid preparation. LensMC, weak lensing cosmic shear measurement with forward modelling and Markov Chain Monte Carlo sampling
Authors: Euclid Collaboration: G. Congedo (1), L. Miller (2), A. N. Taylor (1), N. Cross (1), C. A. J. Duncan (3 and 2), T. Kitching (4), N. Martinet (5), S. Matthew (1), T. Schrabback (6), M. Tewes (7), N. Welikala (1), N. Aghanim (8), A. Amara (9), S. Andreon (10), N. Auricchio (11), M. Baldi (12 and 11 and 13), S. Bardelli (11), R. Bender (14 and 15), C. Bodendorf (14), D. Bonino (16), E. Branchini (17 and 18 and 10), M. Brescia (19 and 20 and 21), J. Brinchmann (22), S. Camera (23 and 24 and 16), V. Capobianco (16), C. Carbone (25), V. F. Cardone (26 and 27), J. Carretero (28 and 29), S. Casas (30), F. J. Castander (31 and 32), M. Castellano (26), S. Cavuoti (20 and 21), A. Cimatti (33), C. J. Conselice (3), L. Conversi (34 and 35), Y. Copin (36), F. Courbin (37), H. M. Courtois (38), M. Cropper (4), et al. (202 additional authors not shown)
Comments: 28 pages, 18 figures, 2 tables
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Data Analysis, Statistics and Probability (physics.data-an); Computation (stat.CO)

LensMC is a weak lensing shear measurement method developed for Euclid and Stage-IV surveys. It is based on forward modelling to deal with convolution by a point spread function with comparable size to many galaxies; sampling the posterior distribution of galaxy parameters via Markov Chain Monte Carlo; and marginalisation over nuisance parameters for each of the 1.5 billion galaxies observed by Euclid. The scientific performance is quantified through high-fidelity images based on the Euclid Flagship simulations and emulation of the Euclid VIS images; realistic clustering with a mean surface number density of 250 arcmin$^{-2}$ ($I_{\rm E}<29.5$) for galaxies, and 6 arcmin$^{-2}$ ($I_{\rm E}<26$) for stars; and a diffraction-limited chromatic point spread function with a full width at half maximum of $0.^{\!\prime\prime}2$ and spatial variation across the field of view. Objects are measured with a density of 90 arcmin$^{-2}$ ($I_{\rm E}<26.5$) in 4500 deg$^2$. The total shear bias is broken down into measurement (our main focus here) and selection effects (which will be addressed elsewhere). We find: measurement multiplicative and additive biases of $m_1=(-3.6\pm0.2)\times10^{-3}$, $m_2=(-4.3\pm0.2)\times10^{-3}$, $c_1=(-1.78\pm0.03)\times10^{-4}$, $c_2=(0.09\pm0.03)\times10^{-4}$; a large detection bias with a multiplicative component of $1.2\times10^{-2}$ and an additive component of $-3\times10^{-4}$; and a measurement PSF leakage of $\alpha_1=(-9\pm3)\times10^{-4}$ and $\alpha_2=(2\pm3)\times10^{-4}$. When model bias is suppressed, the obtained measurement biases are close to Euclid requirement and largely dominated by undetected faint galaxies ($-5\times10^{-3}$). Although significant, model bias will be straightforward to calibrate given the weak sensitivity.

[37]  arXiv:2405.00675 (cross-list from cs.LG) [pdf, other]
Title: Self-Play Preference Optimization for Language Model Alignment
Comments: 25 pages, 4 figures, 5 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed \textit{Self-Play Preference Optimization} (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models.

Replacements for Thu, 2 May 24

[38]  arXiv:1902.09608 (replaced) [pdf, other]
Title: On Binscatter
Journal-ref: American Economic Review, 114(5) 1488-1514, 2024
Subjects: Econometrics (econ.EM); Methodology (stat.ME); Machine Learning (stat.ML)
[39]  arXiv:2003.06002 (replaced) [pdf, other]
Title: Bayesian Posterior Interval Calibration to Improve the Interpretability of Observational Studies
Subjects: Applications (stat.AP)
[40]  arXiv:2207.04248 (replaced) [pdf, other]
Title: A Statistical-Modelling Approach to Feedforward Neural Network Model Selection
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)
[41]  arXiv:2209.05557 (replaced) [pdf, other]
Title: Blurring Diffusion Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[42]  arXiv:2304.01698 (replaced) [pdf, other]
Title: Inverse Unscented Kalman Filter
Comments: 20 pages, 5 figures. arXiv admin note: text overlap with arXiv:2210.00359
Subjects: Optimization and Control (math.OC); Signal Processing (eess.SP); Systems and Control (eess.SY); Machine Learning (stat.ML)
[43]  arXiv:2305.11774 (replaced) [pdf, other]
Title: Multi-objective optimisation via the R2 utilities
Comments: The code is available at: this https URL
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[44]  arXiv:2306.16308 (replaced) [pdf, other]
Title: Gaussian random field approximation via Stein's method with applications to wide random neural networks
Comments: To appear in Applied and Computational Harmonic Analysis
Subjects: Probability (math.PR); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[45]  arXiv:2307.10503 (replaced) [pdf, other]
Title: Regularizing threshold priors with sparse response patterns in Bayesian factor analysis with categorical indicators
Subjects: Methodology (stat.ME); Applications (stat.AP)
[46]  arXiv:2307.15330 (replaced) [pdf, other]
Title: Group integrative dynamic factor models with application to multiple subject brain connectivity
Subjects: Methodology (stat.ME); Applications (stat.AP)
[47]  arXiv:2308.12820 (replaced) [pdf, other]
Title: Prediction without Preclusion: Recourse Verification with Reachable Sets
Comments: ICLR 2024 Spotlight. The first two authors contributed equally
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
[48]  arXiv:2308.14830 (replaced) [pdf, other]
Title: COVID anomaly in the correlation analysis of S&P 500 market states
Comments: 12 pages, 6 figures, supplemental materials (6 figures, 3 videos)
Journal-ref: (2024) PLoS ONE 19(4): e0301238
Subjects: Applications (stat.AP); Computation (stat.CO)
[49]  arXiv:2309.13640 (replaced) [pdf, ps, other]
Title: Visualizing periodic stability in studies: the moving average meta-analysis (MA2)
Comments: 10 pages, 2 figures, 1 table
Subjects: Methodology (stat.ME)
[50]  arXiv:2310.01012 (replaced) [pdf, other]
Title: Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[51]  arXiv:2310.14661 (replaced) [pdf, other]
Title: Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[52]  arXiv:2310.16207 (replaced) [pdf, ps, other]
Title: Propensity weighting plus adjustment in proportional hazards model is not doubly robust
Subjects: Methodology (stat.ME)
[53]  arXiv:2311.11054 (replaced) [pdf, other]
Title: Modern extreme value statistics for Utopian extremes
Subjects: Methodology (stat.ME)
[54]  arXiv:2311.17271 (replaced) [pdf, other]
Title: Spatial-Temporal Extreme Modeling for Point-to-Area Random Effects (PARE)
Comments: 20 pages, 9 tables, 4 figures
Subjects: Methodology (stat.ME)
[55]  arXiv:2311.17831 (replaced) [pdf, other]
Title: Confidence Regions for Filamentary Structures
Authors: Wanli Qiao
Subjects: Statistics Theory (math.ST)
[56]  arXiv:2312.03871 (replaced) [pdf, other]
Title: Hidden yet quantifiable: A lower bound for confounding strength using randomized trials
Comments: Accepted for presentation at the International Conference on Artificial Intelligence and Statistics (AISTATS) 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[57]  arXiv:2312.11456 (replaced) [pdf, other]
Title: Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Comments: 53 pages; theoretical study and algorithmic design of iterative RLHF and DPO
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[58]  arXiv:2312.14191 (replaced) [pdf, ps, other]
Title: Noisy Measurements Are Important, the Design of Census Products Is Much More Important
Authors: John M. Abowd
Journal-ref: Harvard Data Science Review, Volume 6, Number 2 (Spring, 2024)
Subjects: Cryptography and Security (cs.CR); Econometrics (econ.EM); Applications (stat.AP)
[59]  arXiv:2401.11974 (replaced) [pdf, other]
Title: Cross-Validation Conformal Risk Control
Comments: accepted for presentation at 2024 IEEE International Symposium on Information Theory (ISIT 2024)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[60]  arXiv:2402.07712 (replaced) [pdf, other]
Title: Model Collapse Demystified: The Case of Regression
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[61]  arXiv:2404.15378 (replaced) [pdf, other]
Title: Hierarchical Hybrid Sliced Wasserstein: A Scalable Metric for Heterogeneous Joint Distributions
Authors: Khai Nguyen, Nhat Ho
Comments: 28 pages, 11 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[62]  arXiv:2404.17429 (replaced) [pdf, other]
Title: Separation capacity of linear reservoirs with random connectivity matrix
Authors: Youness Boutaib
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
[63]  arXiv:2404.18444 (replaced) [pdf, other]
Title: U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models
Authors: Song Mei
Comments: v2 updated discussions of related literature
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
[64]  arXiv:2404.18702 (replaced) [pdf, other]
Title: Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Applications (stat.AP); Machine Learning (stat.ML)
[65]  arXiv:2404.19145 (replaced) [pdf, other]
Title: Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Machine Learning (stat.ML)
[66]  arXiv:2404.19242 (replaced) [pdf, other]
Title: A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems
Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Methodology (stat.ME)
[ total of 66 entries: 1-66 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2405, contact, help  (Access key information)