We gratefully acknowledge support from
the Simons Foundation and member institutions.

Methodology

New submissions

[ total of 39 entries: 1-39 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 14 May 24

[1]  arXiv:2405.06763 [pdf, other]
Title: Post-selection inference for causal effects after causal discovery
Subjects: Methodology (stat.ME)

Algorithms for constraint-based causal discovery select graphical causal models among a space of possible candidates (e.g., all directed acyclic graphs) by executing a sequence of conditional independence tests. These may be used to inform the estimation of causal effects (e.g., average treatment effects) when there is uncertainty about which covariates ought to be adjusted for, or which variables act as confounders versus mediators. However, naively using the data twice, for model selection and estimation, would lead to invalid confidence intervals. Moreover, if the selected graph is incorrect, the inferential claims may apply to a selected functional that is distinct from the actual causal effect. We propose an approach to post-selection inference that is based on a resampling and screening procedure, which essentially performs causal discovery multiple times with randomly varying intermediate test statistics. Then, an estimate of the target causal effect and corresponding confidence sets are constructed from a union of individual graph-based estimates and intervals. We show that this construction has asymptotically correct coverage for the true causal effect parameter. Importantly, the guarantee holds for a fixed population-level effect, not a data-dependent or selection-dependent quantity. Most of our exposition focuses on the PC-algorithm for learning directed acyclic graphs and the multivariate Gaussian case for simplicity, but the approach is general and modular, so it may be used with other conditional independence based discovery algorithms and distributional families.

[2]  arXiv:2405.06796 [pdf, other]
Title: The Multiple Change-in-Gaussian-Mean Problem
Comments: This is a draft chapter from the forthcoming book "Change-Point Detection and Data Segmentation" by Paul Fearnhead and Piotr Fryzlewicz. Comments, particularly regarding the history of work in this area, are welcome
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO)

A manuscript version of the chapter "The Multiple Change-in-Gaussian-Mean Problem" from the book "Change-Point Detection and Data Segmentation" by Fearnhead and Fryzlewicz, currently in preparation. All R code and data to accompany this chapter and the book are gradually being made available through https://github.com/pfryz/cpdds.

[3]  arXiv:2405.06799 [pdf, other]
Title: Riemannian Statistics for Any Type of Data
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

This paper introduces a novel approach to statistics and data analysis, departing from the conventional assumption of data residing in Euclidean space to consider a Riemannian Manifold. The challenge lies in the absence of vector space operations on such manifolds. Pennec X. et al. in their book Riemannian Geometric Statistics in Medical Image Analysis proposed analyzing data on Riemannian manifolds through geometry, this approach is effective with structured data like medical images, where the intrinsic manifold structure is apparent. Yet, its applicability to general data lacking implicit local distance notions is limited. We propose a solution to generalize Riemannian statistics for any type of data.

[4]  arXiv:2405.06813 [pdf, other]
Title: A note on distance variance for categorical variables
Authors: Qingyang Zhang
Comments: 3 figures
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

This study investigates the extension of distance variance, a validated spread metric for continuous and binary variables [Edelmann et al., 2020, Ann. Stat., 48(6)], to quantify the spread of general categorical variables. We provide both geometric and algebraic characterizations of distance variance, revealing its connections to some commonly used entropy measures, and the variance-covariance matrix of the one-hot encoded representation. However, we demonstrate that distance variance fails to satisfy the Schur-concavity axiom for categorical variables with more than two categories, leading to counterintuitive results. This limitation hinders its applicability as a universal measure of spread.

[5]  arXiv:2405.06866 [pdf, other]
Title: Dynamic Contextual Pricing with Doubly Non-Parametric Random Utility Models
Subjects: Methodology (stat.ME)

In the evolving landscape of digital commerce, adaptive dynamic pricing strategies are essential for gaining a competitive edge. This paper introduces novel {\em doubly nonparametric random utility models} that eschew traditional parametric assumptions used in estimating consumer demand's mean utility function and noise distribution. Existing nonparametric methods like multi-scale {\em Distributional Nearest Neighbors (DNN and TDNN)}, initially designed for offline regression, face challenges in dynamic online pricing due to design limitations, such as the indirect observability of utility-related variables and the absence of uniform convergence guarantees. We address these challenges with innovative population equations that facilitate nonparametric estimation within decision-making frameworks and establish new analytical results on the uniform convergence rates of DNN and TDNN, enhancing their applicability in dynamic environments.
Our theoretical analysis confirms that the statistical learning rates for the mean utility function and noise distribution are minimax optimal. We also derive a regret bound that illustrates the critical interaction between model dimensionality and noise distribution smoothness, deepening our understanding of dynamic pricing under varied market conditions. These contributions offer substantial theoretical insights and practical tools for implementing effective, data-driven pricing strategies, advancing the theoretical framework of pricing models and providing robust methodologies for navigating the complexities of modern markets.

[6]  arXiv:2405.06889 [pdf, other]
Title: Tuning parameter selection for the adaptive nuclear norm regularized trace regression
Subjects: Methodology (stat.ME); Optimization and Control (math.OC)

Regularized models have been applied in lots of areas, with high-dimensional data sets being popular. Because tuning parameter decides the theoretical performance and computational efficiency of the regularized models, tuning parameter selection is a basic and important issue. We consider the tuning parameter selection for adaptive nuclear norm regularized trace regression, which achieves by the Bayesian information criterion (BIC). The proposed BIC is established with the help of an unbiased estimator of degrees of freedom. Under some regularized conditions, this BIC is proved to achieve the rank consistency of the tuning parameter selection. That is the model solution under selected tuning parameter converges to the true solution and has the same rank with that of the true solution in probability. Some numerical results are presented to evaluate the performance of the proposed BIC on tuning parameter selection.

[7]  arXiv:2405.07026 [pdf, other]
Title: Selective Randomization Inference for Adaptive Experiments
Subjects: Methodology (stat.ME)

Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are not pre-specified, it has long been recognized that statistical inference for adaptive experiments is not straightforward. Most existing methods only apply to specific adaptive designs and rely on strong assumptions. In this work, we propose selective randomization inference as a general framework for analyzing adaptive experiments. In a nutshell, our approach applies conditional post-selection inference to randomization tests. By using directed acyclic graphs to describe the data generating process, we derive a selective randomization p-value that controls the selective type-I error without requiring independent and identically distributed data or any other modelling assumptions. We show how rejection sampling and Markov Chain Monte Carlo can be used to compute the selective randomization p-values and construct confidence intervals for a homogeneous treatment effect. To mitigate the risk of disconnected confidence intervals, we propose the use of hold-out units. Lastly, we demonstrate our method and compare it with other randomization tests using synthetic and real-world data.

[8]  arXiv:2405.07102 [pdf, other]
Title: Nested Instrumental Variables Design: Switcher Average Treatment Effect, Identification, Efficient Estimation and Generalizability
Subjects: Methodology (stat.ME); Applications (stat.AP); Other Statistics (stat.OT)

Instrumental variables (IV) are a commonly used tool to estimate causal effects from non-randomized data. A prototype of an IV is a randomized trial with non-compliance where the randomized treatment assignment serves as an IV for the non-ignorable treatment received. Under a monotonicity assumption, a valid IV non-parametrically identifies the average treatment effect among a non-identifiable complier subgroup, whose generalizability is often under debate. In many studies, there could exist multiple versions of an IV, for instance, different nudges to take the same treatment in different study sites in a multi-center clinical trial. These different versions of an IV may result in different compliance rates and offer a unique opportunity to study IV estimates' generalizability. In this article, we introduce a novel nested IV assumption and study identification of the average treatment effect among two latent subgroups: always-compliers and switchers, who are defined based on the joint potential treatment received under two versions of a binary IV. We derive the efficient influence function for the SWitcher Average Treatment Effect (SWATE) and propose efficient estimators. We then propose formal statistical tests of the generalizability of IV estimates based on comparing the conditional average treatment effect among the always-compliers and that among the switchers under the nested IV framework. We apply the proposed framework and method to the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and study the causal effect of colorectal cancer screening and its generalizability.

[9]  arXiv:2405.07109 [pdf, other]
Title: Bridging Binarization: Causal Inference with Dichotomized Continuous Treatments
Subjects: Methodology (stat.ME)

The average treatment effect (ATE) is a common parameter estimated in causal inference literature, but it is only defined for binary treatments. Thus, despite concerns raised by some researchers, many studies seeking to estimate the causal effect of a continuous treatment create a new binary treatment variable by dichotomizing the continuous values into two categories. In this paper, we affirm binarization as a statistically valid method for answering causal questions about continuous treatments by showing the equivalence between the binarized ATE and the difference in the average outcomes of two specific modified treatment policies. These policies impose cut-offs corresponding to the binarized treatment variable and assume preservation of relative self-selection. Relative self-selection is the ratio of the probability density of an individual having an exposure equal to one value of the continuous treatment variable versus another. The policies assume that, for any two values of the treatment variable with non-zero probability density after the cut-off, this ratio will remain unchanged. Through this equivalence, we clarify the assumptions underlying binarization and discuss how to properly interpret the resulting estimator. Additionally, we introduce a new target parameter that can be computed after binarization that considers the status-quo world. We argue that this parameter addresses more relevant causal questions than the traditional binarized ATE parameter. Finally, we present a simulation study to illustrate the implications of these assumptions when analyzing data and to demonstrate how to correctly implement estimators of the parameters discussed.

[10]  arXiv:2405.07138 [pdf, other]
Title: Large-dimensional Robust Factor Analysis with Group Structure
Subjects: Methodology (stat.ME)

In this paper, we focus on exploiting the group structure for large-dimensional factor models, which captures the homogeneous effects of common factors on individuals within the same group. In view of the fact that datasets in macroeconomics and finance are typically heavy-tailed, we propose to identify the unknown group structure using the agglomerative hierarchical clustering algorithm and an information criterion with the robust two-step (RTS) estimates as initial values. The loadings and factors are then re-estimated conditional on the identified groups. Theoretically, we demonstrate the consistency of the estimators for both group membership and the number of groups determined by the information criterion. Under finite second moment condition, we provide the convergence rate for the newly estimated factor loadings with group information, which are shown to achieve efficiency gains compared to those obtained without group structure information. Numerical simulations and real data analysis demonstrate the nice finite sample performance of our proposed approach in the presence of both group structure and heavy-tailedness.

[11]  arXiv:2405.07186 [pdf, other]
Title: Adaptive-TMLE for the Average Treatment Effect based on Randomized Controlled Trial Augmented with Real-World Data
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

We consider the problem of estimating the average treatment effect (ATE) when both randomized control trial (RCT) data and real-world data (RWD) are available. We decompose the ATE estimand as the difference between a pooled-ATE estimand that integrates RCT and RWD and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. We introduce an adaptive targeted minimum loss-based estimation (A-TMLE) framework to estimate them. We prove that the A-TMLE estimator is root-n-consistent and asymptotically normal. Moreover, in finite sample, it achieves the super-efficiency one would obtain had one known the oracle model for the conditional effect of the RCT enrollment on the outcome. Consequently, the smaller the working model of the bias induced by the RWD is, the greater our estimator's efficiency, while our estimator will always be at least as efficient as an efficient estimator that uses the RCT data only. A-TMLE outperforms existing methods in simulations by having smaller mean-squared-error and 95% confidence intervals. A-TMLE could help utilize RWD to improve the efficiency of randomized trial results without biasing the estimates of intervention effects. This approach could allow for smaller, faster trials, decreasing the time until patients can receive effective treatments.

[12]  arXiv:2405.07294 [pdf, ps, other]
Title: Factor Strength Estimation in Vector and Matrix Time Series Factor Models
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Most factor modelling research in vector or matrix-valued time series assume all factors are pervasive/strong and leave weaker factors and their corresponding series to the noise. Weaker factors can in fact be important to a group of observed variables, for instance a sector factor in a large portfolio of stocks may only affect particular sectors, but can be important both in interpretations and predictions for those stocks. While more recent factor modelling researches do consider ``local'' factors which are weak factors with sparse corresponding factor loadings, there are real data examples in the literature where factors are weak because of weak influence on most/all observed variables, so that the corresponding factor loadings are not sparse (non-local). As a first in the literature, we propose estimators of factor strengths for both local and non-local weak factors, and prove their consistency with rates of convergence spelt out for both vector and matrix-valued time series factor models. Factor strength has an important indication in what estimation procedure of factor models to follow, as well as the estimation accuracy of various estimators (Chen and Lam, 2024). Simulation results show that our estimators have good performance in recovering the true factor strengths, and an analysis on the NYC taxi traffic data indicates the existence of weak factors in the data which may not be localized.

[13]  arXiv:2405.07397 [pdf, other]
Title: The Spike-and-Slab Quantile LASSO for Robust Variable Selection in Cancer Genomics Studies
Subjects: Methodology (stat.ME)

Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy-tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the non-robust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high-dimensional genomics data, we propose the spike-and-slab quantile LASSO through a fully Bayesian spike-and-slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self-adaptivity to the sparsity pattern from the spike-and-slab LASSO (Ro\v{c}kov\'a and George, 2018). Furthermore, the spike-and-slab quantile LASSO has a computational advantage to locate the posterior modes via soft-thresholding rule guided Expectation-Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with non-differentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy-tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike-and-slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).

[14]  arXiv:2405.07408 [pdf, other]
Title: Bayesian Spatially Clustered Compositional Regression: Linking intersectoral GDP contributions to Gini Coefficients
Subjects: Methodology (stat.ME); Applications (stat.AP)

The Gini coefficient is an universally used measurement of income inequality. Intersectoral GDP contributions reveal the economic development of different sectors of the national economy. Linking intersectoral GDP contributions to Gini coefficients will provide better understandings of how the Gini coefficient is influenced by different industries. In this paper, a compositional regression with spatially clustered coefficients is proposed to explore heterogeneous effects over spatial locations under nonparametric Bayesian framework. Specifically, a Markov random field constraint mixture of finite mixtures prior is designed for Bayesian log contrast regression with compostional covariates, which allows for both spatially contiguous clusters and discontinous clusters. In addition, an efficient Markov chain Monte Carlo algorithm for posterior sampling that enables simultaneous inference on both cluster configurations and cluster-wise parameters is designed. The compelling empirical performance of the proposed method is demonstrated via extensive simulation studies and an application to 51 states of United States from 2019 Bureau of Economic Analysis.

[15]  arXiv:2405.07504 [pdf, other]
Title: Hierarchical inference of evidence using posterior samples
Comments: 18 pages, 7 figures, 1 table. Comments welcome
Subjects: Methodology (stat.ME)

The Bayesian evidence, crucial ingredient for model selection, is arguably the most important quantity in Bayesian data analysis: at the same time, however, it is also one of the most difficult to compute. In this paper we present a hierarchical method that leverages on a multivariate normalised approximant for the posterior probability density to infer the evidence for a model in a hierarchical fashion using a set of posterior samples drawn using an arbitrary sampling scheme.

[16]  arXiv:2405.07631 [pdf, other]
Title: Improving prediction models by incorporating external data with weights based on similarity
Subjects: Methodology (stat.ME); Applications (stat.AP)

In clinical settings, we often face the challenge of building prediction models based on small observational data sets. For example, such a data set might be from a medical center in a multi-center study. Differences between centers might be large, thus requiring specific models based on the data set from the target center. Still, we want to borrow information from the external centers, to deal with small sample sizes. There are approaches that either assign weights to each external data set or each external observation. To incorporate information on differences between data sets and observations, we propose an approach that combines both into weights that can be incorporated into a likelihood for fitting regression models. Specifically, we suggest weights at the data set level that incorporate information on how well the models that provide the observation weights distinguish between data sets. Technically, this takes the form of inverse probability weighting. We explore different scenarios where covariates and outcomes differ among data sets, informing our simulation design for method evaluation. The concept of effective sample size is used for understanding the effectiveness of our subgroup modeling approach. We demonstrate our approach through a clinical application, predicting applied radiotherapy doses for cancer patients. Generally, the proposed approach provides improved prediction performance when external data sets are similar. We thus provide a method for quantifying similarity of external data sets to the target data set and use this similarity to include external observations for improving performance in a target data set prediction modeling task with small data.

[17]  arXiv:2405.07979 [pdf, other]
Title: Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interference
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, which is typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, which are typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and consider the problem of estimation in low-order outcome models using data from a general experimental design. Our contributions are threefold. First, we present an estimator of the total treatment effect (also called the global average treatment effect) in a low-degree outcome model when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of cluster randomized designs with both Bernoulli and complete randomization. For clustered Bernoulli randomization, we find that our estimator is always unbiased and that its variance scales like the smaller of the variance obtained from a low-order assumption and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. For clustered complete randomization, we find a notable bias-variance trade-off mediated by specific features of the clustering. Third, when choosing a clustered experimental design, our bounds can be used to select a clustering from a set of candidate clusterings. Across a range of graphs and clustering algorithms, we show that our method consistently selects clusterings that perform well on a range of response models, suggesting that our bounds are useful to practitioners.

[18]  arXiv:2405.07985 [pdf, ps, other]
Title: Improved LARS algorithm for adaptive LASSO in the linear regression model
Comments: 10 pages, 3 figures
Subjects: Methodology (stat.ME)

The adaptive LASSO has been used for consistent variable selection in place of LASSO in the linear regression model. In this article, we propose a modified LARS algorithm to combine adaptive LASSO with some biased estimators, namely the Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator, and r-d class estimator. Furthermore, we examine the performance of the proposed algorithm using a Monte Carlo simulation study and real-world examples.

Cross-lists for Tue, 14 May 24

[19]  arXiv:2405.07292 (cross-list from econ.EM) [pdf, other]
Title: Kernel Three Pass Regression Filter
Comments: Keywords: Forecasting, High dimension, Approximate factor model, Reproducing Kernel Hilbert space, Three-pass regression filter
Subjects: Econometrics (econ.EM); Statistical Finance (q-fin.ST); Methodology (stat.ME)

We forecast a single time series using a high-dimensional set of predictors. When these predictors share common underlying dynamics, an approximate latent factor model provides a powerful characterization of their co-movements Bai(2003). These latent factors succinctly summarize the data and can also be used for prediction, alleviating the curse of dimensionality in high-dimensional prediction exercises, see Stock & Watson (2002a). However, forecasting using these latent factors suffers from two potential drawbacks. First, not all pervasive factors among the set of predictors may be relevant, and using all of them can lead to inefficient forecasts. The second shortcoming is the assumption of linear dependence of predictors on the underlying factors. The first issue can be addressed by using some form of supervision, which leads to the omission of irrelevant information. One example is the three-pass regression filter proposed by Kelly & Pruitt (2015). We extend their framework to cases where the form of dependence might be nonlinear by developing a new estimator, which we refer to as the Kernel Three-Pass Regression Filter (K3PRF). This alleviates the aforementioned second shortcoming. The estimator is computationally efficient and performs well empirically. The short-term performance matches or exceeds that of established models, while the long-term performance shows significant improvement.

[20]  arXiv:2405.07343 (cross-list from eess.SY) [pdf, other]
Title: Graph neural networks for power grid operational risk assessment under evolving grid topology
Comments: Manuscript submitted to Applied Energy
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Methodology (stat.ME)

This article investigates the ability of graph neural networks (GNNs) to identify risky conditions in a power grid over the subsequent few hours, without explicit, high-resolution information regarding future generator on/off status (grid topology) or power dispatch decisions. The GNNs are trained using supervised learning, to predict the power grid's aggregated bus-level (either zonal or system-level) or individual branch-level state under different power supply and demand conditions. The variability of the stochastic grid variables (wind/solar generation and load demand), and their statistical correlations, are rigorously considered while generating the inputs for the training data. The outputs in the training data, obtained by solving numerous mixed-integer linear programming (MILP) optimal power flow problems, correspond to system-level, zonal and transmission line-level quantities of interest (QoIs). The QoIs predicted by the GNNs are used to conduct hours-ahead, sampling-based reliability and risk assessment w.r.t. zonal and system-level (load shedding) as well as branch-level (overloading) failure events. The proposed methodology is demonstrated for three synthetic grids with sizes ranging from 118 to 2848 buses. Our results demonstrate that GNNs are capable of providing fast and accurate prediction of QoIs and can be good proxies for computationally expensive MILP algorithms. The excellent accuracy of GNN-based reliability and risk assessment suggests that GNN models can substantially improve situational awareness by quickly providing rigorous reliability and risk estimates.

[21]  arXiv:2405.07359 (cross-list from cs.LG) [pdf, other]
Title: Forecasting with an N-dimensional Langevin Equation and a Neural-Ordinary Differential Equation
Comments: 26 pages, 7 figures
Journal-ref: Chaos, 34, 043105 (2024)
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)

Accurate prediction of electricity day-ahead prices is essential in competitive electricity markets. Although stationary electricity-price forecasting techniques have received considerable attention, research on non-stationary methods is comparatively scarce, despite the common prevalence of non-stationary features in electricity markets. Specifically, existing non-stationary techniques will often aim to address individual non-stationary features in isolation, leaving aside the exploration of concurrent multiple non-stationary effects. Our overarching objective here is the formulation of a framework to systematically model and forecast non-stationary electricity-price time series, encompassing the broader scope of non-stationary behavior. For this purpose we develop a data-driven model that combines an N-dimensional Langevin equation (LE) with a neural-ordinary differential equation (NODE). The LE captures fine-grained details of the electricity-price behavior in stationary regimes but is inadequate for non-stationary conditions. To overcome this inherent limitation, we adopt a NODE approach to learn, and at the same time predict, the difference between the actual electricity-price time series and the simulated price trajectories generated by the LE. By learning this difference, the NODE reconstructs the non-stationary components of the time series that the LE is not able to capture. We exemplify the effectiveness of our framework using the Spanish electricity day-ahead market as a prototypical case study. Our findings reveal that the NODE nicely complements the LE, providing a comprehensive strategy to tackle both stationary and non-stationary electricity-price behavior. The framework's dependability and robustness is demonstrated through different non-stationary scenarios by comparing it against a range of basic naive methods.

[22]  arXiv:2405.07552 (cross-list from stat.ML) [pdf, other]
Title: Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery
Comments: Forty-first International Conference on Machine Learning (ICML 2024)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

In this paper, we focus on distributed estimation and support recovery for high-dimensional linear quantile regression. Quantile regression is a popular alternative tool to the least squares regression for robustness against outliers and data heterogeneity. However, the non-smoothness of the check loss function poses big challenges to both computation and theory in the distributed setting. To tackle these problems, we transform the original quantile regression into the least-squares optimization. By applying a double-smoothing approach, we extend a previous Newton-type distributed approach without the restrictive independent assumption between the error term and covariates. An efficient algorithm is developed, which enjoys high computation and communication efficiency. Theoretically, the proposed distributed estimator achieves a near-oracle convergence rate and high support recovery accuracy after a constant number of iterations. Extensive experiments on synthetic examples and a real data application further demonstrate the effectiveness of the proposed method.

[23]  arXiv:2405.07836 (cross-list from cs.LG) [pdf, other]
Title: Forecasting with Hyper-Trees
Comments: Forecasting, Gradient Boosting, Hyper-Networks, LightGBM, Parameter Non-Stationarity, Time Series, XGBoost
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

This paper introduces the concept of Hyper-Trees and offers a new direction in applying tree-based models to time series data. Unlike conventional applications of decision trees that forecast time series directly, Hyper-Trees are designed to learn the parameters of a target time series model. Our framework leverages the gradient-based nature of boosted trees, which allows us to extend the concept of Hyper-Networks to Hyper-Trees and to induce a time-series inductive bias to tree models. By relating the parameters of a target time series model to features, Hyper-Trees address the challenge of parameter non-stationarity and enable tree-based forecasts to extend beyond their initial training range. With our research, we aim to explore the effectiveness of Hyper-Trees across various forecasting scenarios and to expand the application of gradient boosted decision trees past their conventional use in time series forecasting.

[24]  arXiv:2405.07910 (cross-list from math.ST) [pdf, ps, other]
Title: A Unification of Exchangeability and Continuous Exposure and Confounder Measurement Errors: Probabilistic Exchangeability
Authors: Honghyok Kim
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

Exchangeability concerning a continuous exposure, X, implies no confounding bias when identifying average exposure effects of X, AEE(X). When X is measured with error (Xep), two challenges arise in identifying AEE(X). Firstly, exchangeability regarding Xep does not equal exchangeability regarding X. Secondly, the necessity of the non-differential error assumption (NDEA), overly stringent in practice, remains uncertain. To address them, this article proposes unifying exchangeability and exposure and confounder measurement errors with three novel concepts. The first, Probabilistic Exchangeability (PE), states that the outcomes of those with Xep=e are probabilistically exchangeable with the outcomes of those truly exposed to X=eT. The relationship between AEE(Xep) and AEE(X) in risk difference and ratio scales is mathematically expressed as a probabilistic certainty, termed exchangeability probability (Pe). Squared Pe (Pe.sq) quantifies the extent to which AEE(Xep) differs from AEE(X) due to exposure measurement error not akin to confounding mechanisms. In realistic settings, the coefficient of determination (R.sq) in the regression of X against Xep may be sufficient to measure Pe.sq. The second concept, Emergent Pseudo Confounding (EPC), describes the bias introduced by exposure measurement error, akin to confounding mechanisms. PE can hold when EPC is controlled for, which is weaker than NDEA. The third, Emergent Confounding, describes when bias due to confounder measurement error arises. Adjustment for E(P)C can be performed like confounding adjustment to ensure PE. This paper provides justifies for using AEE(Xep) and maximum insight into potential divergence of AEE(Xep) from AEE(X) and its measurement. Differential errors do not necessarily compromise causal inference.

[25]  arXiv:2405.07971 (cross-list from stat.ML) [pdf, other]
Title: Sensitivity Analysis for Active Sampling, with Applications to the Simulation of Analog Circuits
Comments: 7 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)

We propose an active sampling flow, with the use-case of simulating the impact of combined variations on analog circuits. In such a context, given the large number of parameters, it is difficult to fit a surrogate model and to efficiently explore the space of design features.
By combining a drastic dimension reduction using sensitivity analysis and Bayesian surrogate modeling, we obtain a flexible active sampling flow. On synthetic and real datasets, this flow outperforms the usual Monte-Carlo sampling which often forms the foundation of design space exploration.

Replacements for Tue, 14 May 24

[26]  arXiv:2008.13087 (replaced) [pdf, other]
Title: Efficient Nested Simulation Experiment Design via the Likelihood Ratio Method
Subjects: Methodology (stat.ME); Risk Management (q-fin.RM)
[27]  arXiv:2102.10778 (replaced) [pdf, other]
Title: Interactive identification of individuals with positive treatment effect while controlling false discoveries
Comments: 44 pages, 15 figures
Subjects: Methodology (stat.ME)
[28]  arXiv:2301.11472 (replaced) [pdf, other]
Title: Fast Bayesian inference for spatial mean-parameterized Conway-Maxwell-Poisson models
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
[29]  arXiv:2311.08691 (replaced) [pdf, ps, other]
Title: On Doubly Robust Estimation with Nonignorable Missing Data Using Instrumental Variables
Comments: 29 pages
Subjects: Methodology (stat.ME)
[30]  arXiv:2312.05802 (replaced) [pdf, other]
Title: Enhancing Scalability in Bayesian Nonparametric Factor Analysis of Spatiotemporal Data
Authors: Yifan Cheng, Cheng Li
Comments: added a VAR(1) process for the latent temporal factors (the new Appendix C)
Subjects: Methodology (stat.ME); Computation (stat.CO)
[31]  arXiv:2312.06204 (replaced) [pdf, ps, other]
Title: Multilayer Network Regression with Eigenvector Centrality and Community Structure
Subjects: Methodology (stat.ME)
[32]  arXiv:2401.00097 (replaced) [pdf, other]
Title: Recursive identification with regularization and on-line hyperparameters estimation
Comments: this https URL
Subjects: Methodology (stat.ME); Systems and Control (eess.SY); Optimization and Control (math.OC)
[33]  arXiv:2401.03881 (replaced) [pdf, other]
Title: Density regression via Dirichlet process mixtures of normal structured additive regression models
Subjects: Methodology (stat.ME); Applications (stat.AP)
[34]  arXiv:2401.07018 (replaced) [pdf, other]
Title: Graphical models for cardinal paired comparisons data
Comments: 63 pages, 6 figures
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[35]  arXiv:2403.16906 (replaced) [pdf, ps, other]
Title: Comparing statistical likelihoods with diagnostic probabilities based on directly observed proportions to help understand the replication crisis
Authors: Huw Llewelyn
Comments: 11 pages. 2 figures
Subjects: Methodology (stat.ME)
[36]  arXiv:2203.02605 (replaced) [pdf, other]
Title: Reinforcement Learning in Modern Biostatistics: Constructing Optimal Adaptive Interventions
Comments: 57 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)
[37]  arXiv:2404.15440 (replaced) [pdf, ps, other]
Title: Exploring Convergence in Relation using Association Rules Mining: A Case Study in Collaborative Knowledge Production
Subjects: Human-Computer Interaction (cs.HC); Applications (stat.AP); Methodology (stat.ME)
[38]  arXiv:2404.16583 (replaced) [pdf, other]
Title: Fast Machine-Precision Spectral Likelihoods for Stationary Time Series
Subjects: Numerical Analysis (math.NA); Computation (stat.CO); Methodology (stat.ME)
[39]  arXiv:2404.17483 (replaced) [pdf, other]
Title: Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation
Comments: Accepted to the 40th Conference on Uncertainty in Artificial Intelligence (UAI2024). 14 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[ total of 39 entries: 1-39 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2405, contact, help  (Access key information)