arXiv (Cornell University)
Optimal Semi-supervised Estimation and Inference for High-dimensional Linear Regression
November 2020 • Siyi Deng, Yang Ning, Jiwei Zhao, Heping Zhang
There are many scenarios such as the electronic health records where the outcome is much more difficult to collect than the covariates. In this paper, we consider the linear regression problem with such a data structure under the high dimensionality. Our goal is to investigate when and how the unlabeled data can be exploited to improve the estimation and inference of the regression parameters in linear models, especially in light of the fact that such linear models may be misspecified in data analysis. In particul…