doi.org
Learning with Missing Data
December 2020 • Carlos A. Escobar, Jorge Arinez, Daniela Macias, Rubén Morales-Menéndez
Many real-world data sets contain missing values, therefore, learning with incomplete data sets is a common challenge faced by data scientists. Handling them in an intelligent way is important to develop robust data models, since there is no perfect approach to compensate for the missing values. Deleting the rows with empty cells is a commonly used approach, this naive method may lead to estimates with larger standard errors due to reduced sample size. On the other hand, imputing the missing records is a better ap…