Yang Ning
YOU?
Author Swipe
View article: Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety
Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety Open
Randomized controlled trials (RCTs) are widely regarded as the gold standard for causal inference in biomedical research. For instance, when estimating the average treatment effect on the treated (ATT), a doubly robust estimation procedure…
View article: Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering
Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering Open
LLMs often suffer from hallucinations and outdated or incomplete knowledge. RAG is proposed to address these issues by integrating external knowledge like that in KGs into LLMs. However, leveraging private KGs in RAG systems poses signific…
View article: Smart business generats dynamic econometric method of optimizing sales and revenues of processing enterprises
Smart business generats dynamic econometric method of optimizing sales and revenues of processing enterprises Open
The article formulates the methodological limitations of the econometric practical application of quantitative calculations of balanced proportions of sales volumes and incomes of processing enterprises in the context of digitalization and…
View article: Reasoning based on symbolic and parametric knowledge bases: a survey
Reasoning based on symbolic and parametric knowledge bases: a survey Open
Reasoning is fundamental to human intelligence, and critical for problem-solving, decision-making, and critical thinking. Reasoning refers to drawing new conclusions based on existing knowledge, which can support various applications like …
View article: DisC<sup>2</sup>o-HD: Distributed causal inference with covariates shift for analyzing real-world high-dimensional data.
DisC<sup>2</sup>o-HD: Distributed causal inference with covariates shift for analyzing real-world high-dimensional data. Open
High-dimensional healthcare data, such as electronic health records (EHR) data and claims data, present two primary challenges due to the large number of variables and the need to consolidate data from multiple clinical sites. The third ke…
View article: Optimal Sampling for Generalized Linear Model under Measurement Constraint with Surrogate Variables
Optimal Sampling for Generalized Linear Model under Measurement Constraint with Surrogate Variables Open
Measurement-constrained datasets, often encountered in semi-supervised learning, arise when data labeling is costly, time-intensive, or hindered by confidentiality or ethical concerns, resulting in a scarcity of labeled data. In certain ca…
View article: Active Subsampling for Measurement-Constrained M-Estimation of Individualized Thresholds with High-Dimensional Data
Active Subsampling for Measurement-Constrained M-Estimation of Individualized Thresholds with High-Dimensional Data Open
In the measurement-constrained problems, despite the availability of large datasets, we may be only affordable to observe the labels on a small portion of the large dataset. This poses a critical question that which data points are most be…
View article: Communication‐Efficient Distributed Estimation of Causal Effects With High‐Dimensional Data
Communication‐Efficient Distributed Estimation of Causal Effects With High‐Dimensional Data Open
We propose a communication‐efficient algorithm to estimate the average treatment effect (ATE), when the data are distributed across multiple sites and the number of covariates is possibly much larger than the sample size in each site. Our …
View article: Inference with non-differentiable surrogate loss in a general high-dimensional classification framework
Inference with non-differentiable surrogate loss in a general high-dimensional classification framework Open
Penalized empirical risk minimization with a surrogate loss function is often used to derive a high-dimensional linear decision rule in classification problems. Although much of the literature focuses on the generalization error, there is …
View article: Discussion of ‘Statistical inference for streamed longitudinal data’
Discussion of ‘Statistical inference for streamed longitudinal data’ Open
Journal Article Discussion of 'Statistical inference for streamed longitudinal data' Get access Yang Ning, Yang Ning Department of Statistics and Data Science, Cornell University, 1188 Comstock Hall, Ithaca, New York 14853, U.S.A Email: yn…
View article: Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning
Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning Open
We consider the estimation problem in high-dimensional semi-supervised learning. Our goal is to investigate when and how the unlabeled data can be exploited to improve the estimation of the regression parameters of linear model in light of…
View article: Test of Significance for High-Dimensional Thresholds with Application to Individualized Minimal Clinically Important Difference
Test of Significance for High-Dimensional Thresholds with Application to Individualized Minimal Clinically Important Difference Open
This work is motivated by learning the individualized minimal clinically important difference, a vital concept to assess clinical importance in various biomedical studies. We formulate the scientific question into a high-dimensional statis…
View article: Inference in High-Dimensional Multivariate Response Regression with Hidden Variables
Inference in High-Dimensional Multivariate Response Regression with Hidden Variables Open
This article studies the inference of the regression coefficient matrix under multivariate response linear regressions in the presence of hidden variables. A novel procedure for constructing confidence intervals of entries of the coefficie…
View article: Optimal Variable Clustering for High-Dimensional Matrix Valued Data
Optimal Variable Clustering for High-Dimensional Matrix Valued Data Open
Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which …
View article: Optimal Covariate Balancing Conditions in Propensity Score Estimation
Optimal Covariate Balancing Conditions in Propensity Score Estimation Open
Inverse probability of treatment weighting (IPTW) is a popular method for estimating the average treatment effect (ATE). However, empirical studies show that the IPTW estimators can be sensitive to the misspecification of the propensity sc…
View article: On the global identifiability of logistic regression models with misclassified outcomes
On the global identifiability of logistic regression models with misclassified outcomes Open
In the last decade, the secondary use of large data from health systems, such as electronic health records, has demonstrated great promise in advancing biomedical discoveries and improving clinical decision making. However, there is an inc…
View article: PLEMT: A Novel Pseudolikelihood-Based EM Test for Homogeneity in Generalized Exponential Tilt Mixture Models
PLEMT: A Novel Pseudolikelihood-Based EM Test for Homogeneity in Generalized Exponential Tilt Mixture Models Open
Motivated by analyses of DNA methylation data, we propose a semiparametric mixture model, namely, the generalized exponential tilt mixture model, to account for heterogeneity between differentially methylated and nondifferentially methylat…
View article: Optimal Sampling for Generalized Linear Models Under Measurement Constraints
Optimal Sampling for Generalized Linear Models Under Measurement Constraints Open
Under “measurement constraints,” responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. Our goal is to sample a relatively small portion of t…
View article: Optimal Semi-supervised Estimation and Inference for High-dimensional Linear Regression
Optimal Semi-supervised Estimation and Inference for High-dimensional Linear Regression Open
There are many scenarios such as the electronic health records where the outcome is much more difficult to collect than the covariates. In this paper, we consider the linear regression problem with such a data structure under the high dime…
View article: Test of significance for high-dimensional longitudinal data
Test of significance for high-dimensional longitudinal data Open
This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low dimensional parameter of interest. The ma…
View article: Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data
Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data Open
This paper proposes a doubly robust two-stage semiparametric difference-in-difference estimator for estimating heterogeneous treatment effects with high-dimensional data. Our new estimator is robust to model miss-specifications and allows …
View article: Doubly Robust Semiparametric Difference-in-Differences Estimators with\n High-Dimensional Data
Doubly Robust Semiparametric Difference-in-Differences Estimators with\n High-Dimensional Data Open
This paper proposes a doubly robust two-stage semiparametric\ndifference-in-difference estimator for estimating heterogeneous treatment\neffects with high-dimensional data. Our new estimator is robust to model\nmiss-specifications and allo…
View article: Adaptive estimation in structured factor models with applications to overlapping clustering
Adaptive estimation in structured factor models with applications to overlapping clustering Open
This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix $A$ in a latent factor model $X=AZ+E$, for an observable random vector $X\\in \\mathbb{R}^{p}$, with correlated unobservable fact…
View article: Estimation and inference on high-dimensional individualized treatment rule in observational data using split-and-pooled de-correlated score
Estimation and inference on high-dimensional individualized treatment rule in observational data using split-and-pooled de-correlated score Open
With the increasing adoption of electronic health records, there is an increasing interest in developing individualized treatment rules, which recommend treatments according to patients' characteristics, from large observational data. Howe…
View article: Nonlinear aeroelastic analysis of the folding fin with freeplay under thermal environment
Nonlinear aeroelastic analysis of the folding fin with freeplay under thermal environment Open
The nonlinear aeroelastic behavior of a folding fin in supersonic flow is investigated in this paper. The finite element model of the fin is established and the deployable hinges are represented by three torsion springs with the freeplay n…
View article: On specification tests for composite likelihood inference
On specification tests for composite likelihood inference Open
Summary Composite likelihood functions are often used for inference in applications where the data have a complex structure. While inference based on the composite likelihood can be more robust than inference based on the full likelihood, …
View article: Adaptive Estimation of Multivariate Regression with Hidden Variables
Adaptive Estimation of Multivariate Regression with Hidden Variables Open
This paper studies the estimation of the coefficient matrix $\Ttheta$ in multivariate regression with hidden variables, $Y = (\Ttheta)^TX + (B^*)^TZ + E$, where $Y$ is a $m$-dimensional response vector, $X$ is a $p$-dimensional vector of o…
View article: A fast score test for generalized mixture models
A fast score test for generalized mixture models Open
In biomedical studies, testing for homogeneity between two groups, where one group is modeled by mixture models, is often of great interest. This paper considers the semiparametric exponential family mixture model proposed by Hong et al . …
View article: Heterogeneity-aware and communication-efficient distributed statistical inference
Heterogeneity-aware and communication-efficient distributed statistical inference Open
In multicenter research, individual-level data are often protected against sharing across sites. To overcome the barrier of data sharing, many distributed algorithms, which only require sharing aggregated information, have been developed. …
View article: Regression Discontinuity Design under Self-selection
Regression Discontinuity Design under Self-selection Open
In Regression Discontinuity (RD) design, self-selection leads to different distributions of covariates on two sides of the policy intervention, which essentially violates the continuity of potential outcome assumption. The standard RD esti…