How to use pooled results from multiple imputation?

Ask Question

Asked 7 years, 11 months ago

Modified 7 years, 11 months ago

Viewed 2k times

I've been reading some posts about data imputation using multiple imputation, specifically the MICE R package. I get the main idea of creating multiple datasets with imputed data. The part that is not clear to me is the linear regressions + pooling results. At the end I'd do something like this:

modelFit1 <- with(data,lm(var3~ var1+var2))
pool(modelFit1)
summary(pool(modelFit1))

The summary shows some coefficient estimates, should I use those to predict the final values for the missing data (var3)? If that's correct, what should I do if I also want to impute data for var2? Can I use var3 and var1 (kind of circular)?

asked Dec 22, 2017 at 8:14

paipaipai

312 bronze badges

2

$\begingroup$ The usual situation is like this: you want to fit a model, say a regression but you have missing data. So you impute them and use the imputed, completed datasets to fit the model. Because you did multiple imputations, you have to pool the regression results from all imputed datasets. This pooled result is an estimate of the regression model for the complete dataset with no missings. So at the step of fitting and pooling, all the imputation has already been done. I'm not aware of using the pooled results for further imputations. $\endgroup$

COOLSerdash
– COOLSerdash

2017-12-22 09:46:02 +00:00
Commented Dec 22, 2017 at 9:46
1

$\begingroup$ Oh ok I got it. What if I just need to use the imputed data to calculate a mean across var1 var2 and var3? Should I average the n imputed datasets? $\endgroup$

paipaipai
– paipaipai

2017-12-22 15:32:59 +00:00
Commented Dec 22, 2017 at 15:32
$\begingroup$ If you just want to estimate the univariate mean of var1, var2 and var3, you could use with(data, lm(var1~1)) and then pool (the same for var2 and var3). Another possibility is to use the function pool.scalar. The help file of pool.scalar has an example of how to estimate the mean with the imputed datasets. $\endgroup$

COOLSerdash
– COOLSerdash

2017-12-22 16:23:35 +00:00
Commented Dec 22, 2017 at 16:23
$\begingroup$ @COOLSerdash thanks for your help. Actually I want something like (var1 + var2 + var3) / 3, mean using the 3 variables where two of them had some missing values before imputation. $\endgroup$

paipaipai
– paipaipai

2017-12-22 16:27:25 +00:00
Commented Dec 22, 2017 at 16:27
$\begingroup$ Adding the three variables and dividing by 3 gives you a new variable of length $n$ for each of the imputed datasets (basically a row-mean). Pooling comes into play when you calculate a statistic using this new variable, say the mean or variance. $\endgroup$

COOLSerdash
– COOLSerdash

2017-12-22 16:35:12 +00:00
Commented Dec 22, 2017 at 16:35

| Show 5 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

How to use pooled results from multiple imputation?

0

Your Answer

Hot Network Questions

How to use pooled results from multiple imputation?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions