I am trying to correct a variable from a survey that has measurement error. To do this, I have been taking this column as if it was missing and imputing new values based on the predictions of an XGBoost model that is trained with other surveys from different years and that have common variables. I did this using R-Studio software.
I know three things about the variable with measurment error: (1) that it is normally distributed, (2) its true mean, and (3) its true standard deviation.
However, I don't know how to incorporate this knowledge into the model so that individual imputation results in a population distribution with these characteristics.
So far, without giving this information to model, it does replicate the form of a gaussian distribution and the mean is close to what it is supposed to be, but the standard deviation is too low. This is true even when varying the parameters of the model.
I have also tried multiple imputation (with predictive mean matching method), propensity score matching and linear regression imputation, but we have not obtained good results.
Any ideas?