How to use priors to impute values at an individual level and replicate a distribution of the population?

Question

I am trying to correct a variable from a survey that has measurement error. To do this, I have been taking this column as if it was missing and imputing new values based on the predictions of an XGBoost model that is trained with other surveys from different years and that have common variables. I did this using R-Studio software.

I know three things about the variable with measurment error: (1) that it is normally distributed, (2) its true mean, and (3) its true standard deviation.

However, I don't know how to incorporate this knowledge into the model so that individual imputation results in a population distribution with these characteristics.

So far, without giving this information to model, it does replicate the form of a gaussian distribution and the mean is close to what it is supposed to be, but the standard deviation is too low. This is true even when varying the parameters of the model.

I have also tried multiple imputation (with predictive mean matching method), propensity score matching and linear regression imputation, but we have not obtained good results.

Any ideas?

prijatelj · Accepted Answer · 2023-10-28 05:42:26Z

You could model the column as a latent variable and use MCMC to model the latent variable as a Gaussian informed by your true mean & standard deviation.

The data used to train your xgboost model could then be used to further inform the latent variable based on the relationship of that column to the other variables.

A pymc tutorial on Bayesian imputation using MCMC: https://www.pymc.io/projects/examples/en/latest/case_studies/Missing_Data_Imputation.html#bayesian-imputation

Here is a numpy pryo tutorial on imputation using MCMC and NUTS. https://num.pyro.ai/en/stable/tutorials/bayesian_imputation.html

Stack Exchange Network

How to use priors to impute values at an individual level and replicate a distribution of the population?

1 Answer 1

Your Answer

Hot Network Questions

How to use priors to impute values at an individual level and replicate a distribution of the population?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions