Questions tagged [multiple-imputation]
Use this tag for questions involving multiple imputation, which refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data.
557 questions
1
vote
1
answer
1k
views
Asterisks next to variable names in R MICE
I have been doing multiple imputation using MICE in R. When running, some variables in the console appeared to have asterisks next to their name, e.g., x1* x2* x3*. These asterisks started to appear ...
1
vote
1
answer
215
views
Best way to handle missing data for Network Modelling?
I'm planning to undertake Network Modelling. However, I've been told that multiple imputation is a problem for Network Modelling. I have a lot of missing data. Any suggestions?
Thanks!
1
vote
0
answers
94
views
What are the implications of a low coverage in multiple imputation?
When testing multiple imputation algorithms in simulations, the bias of the examined estimates and the 95% coverage rate are often used as a quality metric. I understand that it is generally ...
2
votes
1
answer
809
views
How to best impute missing values of county-level time series data using R?
I have a dataset consisting of mobility data at the county-level for the US for about one year. So the number of observations is >1m. Apart from the county code, the date, and the mobility index, ...
2
votes
1
answer
112
views
Survey Analysis: Weighted Randomization to fill Missing Values
I am developing a Survey Analysis Application that works with Categorical Data and looking for a way to treat missing data fairly without losing the original data variance.
Thus, I came up with a way ...
0
votes
1
answer
853
views
How to use mice package to impute the subject level variables?
I am trying to do multiple imputation on a data set like below. This is a dummy data set. There are no missing values for variables idnum and eye. But specific idnum should have same bmi and gender ...
3
votes
1
answer
762
views
Correct pooling of shared frailty models fitted on multiply imputed datasets
I'm wondering whether the pool() function from R's mice package correctly pools shared frailty models fitted on multiply imputed ...
4
votes
1
answer
420
views
Combining imputations of generalized linear model regression coefficients - same as linear multiple regression?
I'm doing some imputation with the MICE package. The outcome variables I am using are zero-inflated, and in the absence of imputation, I would analyze them with a zero-inflated negative binomial ...
4
votes
1
answer
2k
views
How does the kNN imputer actually work?
I've understood that the kNN imputer, being a multivariate imputer, is "better" than univariate approaches like SimpleImputer in the sense that it takes multiple variables into account, ...
1
vote
0
answers
109
views
Training Individual-Level Predictor using Distribution of Group-level Data
I have a problem in which I'm looking to train an individual-level predictor for outcomes. I have information on individual-level covariates, but I don't have individual-level outcome variables. ...
3
votes
1
answer
2k
views
Multiple imputation for non-parametric tests on small sample?
I am new to multiple imputation, so apologies if the question is naive. I have been exploring multiple imputation for use on a small data set where there are just a couple of data points. I would like ...
0
votes
0
answers
60
views
Best method of imputation
I am very new to SPSS and the Statistics package.
I have bi-weekly data on precipitation, air temperature, and several well water temperature (n~75). However, due to various reasons I have about 60-70%...
2
votes
1
answer
445
views
Should multiple imputation be used before or after analytic sample inclusion/exclusion criteria are applied?
I have a question about using multiple imputation. I have a dataset of approximately 16,000 people with over 15 years of follow-up. To address missing data and loss-to-follow-up, I'm using multiple ...
0
votes
0
answers
97
views
Verify whether data is missing at random (for multiple imputation)
I'm currently working with a data set from the medical domain, in which several parameters contain missing values (up to 40% of the data). The missing data is mostly caused by some hospitals not ...
0
votes
1
answer
105
views
Is it possible to pool standardized differences across multiple imputations after matching in R?
Are there any statistical procedures to pool standardized differences across multiple imputations after matching in R? I am aware of pooled estimates using mira but I just want the pooled standardized ...
0
votes
1
answer
509
views
With multiple imputed data, how do you probe a categorical-by-categorical interaction in logistic regression?
I am working with 10 multiple imputed datasets in SAS. I used the command PROC SURVEYLOGISTIC to fit a multivariate model with 6 predictors (3 dichotomous and 3 categorical) and their interaction ...
2
votes
1
answer
341
views
Does multiple imputation (MI) introduce bias in estimates?
I am trying to use MI to deal with missing values in my data set. If I understand correctly, MI is about simulating multiple data sets from a given initial data set and imputing possible values ...
5
votes
1
answer
2k
views
Machine Learning on MICE-imputed data
I'm working on a project with medical data where some of it is missing. We decided to impute the data using MICE and I found enough literature about how to choose $m$ (the number of imputations) and $...
9
votes
4
answers
2k
views
Best way to combine MCMC inference with multiple imputation? [duplicate]
I can derive an MCMC algorithm for sampling from the posterior distribution of a parameter vector of interest, but only starting with a dataset that has no missing values. The actual dataset that I ...
5
votes
3
answers
5k
views
Is it possible to imput values using mice package, reshape and perform GEE in R?
I have a longitudinal database that has more than 50% of the missing data of the MAR type.
This amount of missing values was a surprise to me because I did not foresee this in the study design, and ...
0
votes
1
answer
229
views
Can I run glm after MI with Elastic-Net non-zeroed coefficients from 'miselect'?
I have data with n = 80 and 10 predictors, and ran MI using MICE, followed by Variable Selection for Multiply Imputed Data using ‘miselect’ and finally have 4 non-zeroed coefficients.
Since ...
0
votes
1
answer
443
views
How complex can the substantive model be in multiple imputation with smcfcs
We are interested in using a substantive model compatible fully conditional specification (smcfcs) for multiple imputations of missing data. I believe this approach will lead to more unbiased ...
1
vote
1
answer
1k
views
How to correct for multiple testing when using multiple mixed effects models on imputed data
I have a dataset that has 6 metabolites that were measured over time in two groups and using a mixed-effects linear model I would like to investigate the group differences for each metabolite. Since ...
1
vote
1
answer
451
views
impute with the variable involved in the analysis only or with the whole data set when use multiple imputation to handle missing data?
I have a dataframe with 6 columns, X1_t1, X2_t1, X3_t1, X1_t2, X2_t2, X3_t2, each of them have some missing values, which one of the following options is recommended if I want to examine the change of ...
1
vote
1
answer
468
views
Calculating pooled estimates after manipulating datasets
After I imputed two datasets, the two datasets are synthesized via statistical matching. With the synthesized data, I tried to pool estimates but got errors as below.
...
3
votes
2
answers
1k
views
Impute missing values of dummy variables, using R's {caret} package: predicted values in between {0;1}?
I'm using {caret} to impute missing data resulting from non-response to survey questions. All of these variables are defined as numeric, though most are dummies. ...
0
votes
0
answers
415
views
How can I include variables as predictors but not variables to be imputed in multiple imputation with Amelia?
I am using Amelia for multiple imputation, and have a very large list of variables to predict missingness in my exposure and outcome. However, I don't want to impute values in these predictors, I only ...
0
votes
0
answers
106
views
Multiple Imputation: How to only impute certain predictor & auxiliary variables
I've imputed variables to conduct a multiple regression analysis in SPSS but unfortunately SPSS doesn't provide certain pooled values, like R², standardized betas, etc. It looks like I have to conduct ...
3
votes
1
answer
1k
views
Imputing panel data in the wide format, obtaining pooled standard errors after using lmer
I have a longitudinal data set with missing values. I want to multiply impute (let us say $m$ = 20 times) the missing values in the wide format using the R-package mice. Thereafter, I would like to ...
0
votes
0
answers
67
views
What are some Multivariate Imputation by Chained Regression (MICE) diagnostic plots?
The mice package says, "Many diagnostic plots are implemented to inspect the quality of the imputations."
What are those plots? Which are useful? How do you use them?
0
votes
1
answer
171
views
Multiple Imputation: Include Original Dataset?
I am using SPSS and I'm currently working with a datafile that contains multiple imputation (40 imputations) because of missing data.
My question: Should I keep the original dataset (with missing ...
3
votes
0
answers
113
views
Theory behind Multivariate Imputation with Chained Equations
Can anyone provide a reference to the theory that supports multivariate imputation with chained equations (MICE). I know Rubin has provided this for MI but MICE is a Gibbs sampler (I have never seen ...
2
votes
0
answers
471
views
Combining random forest variable importance p-values from multiply imputed datasets
I am using the ranger package in R to construct random forests on 10 imputed datasets after implementing MICE to fill in missing values. The ranger package provides not only a variable importance ...
4
votes
1
answer
710
views
Multiple imputation with deletion of response variables and model selection?
I am using the MICE package in R to impute my dataset to deal with missing values. I have missing values in both the response and the predictor variables. I am interested in following Paul T. Von ...
2
votes
1
answer
2k
views
Will Multiple Imputation (MICE) work on dataset with missing data on only one feature?
Based on this article, it is apparent that MICE works with the following logic:
Fill missing values in every column apart from the column in question with either random or the mean of the given ...
1
vote
1
answer
971
views
Multiple imputation in R with mice package
I have conducted a multiple imputation in R with 5 imputations and 50 iterations using the function mice() from the corresponding mice package.
Now that I have ...
0
votes
0
answers
50
views
Can I impute a variable using MICE so that I can use the value(s) from this imputed variable to then code another variable?
I am working with pregnancy data where I would like to impute a variable called LABOR PRESENTATION (nomical var. with 5 categories) from several other variables but then create a variable called ...
4
votes
1
answer
675
views
Margins after mice?
I would like to apply the margins function to imputed data (I used mice), but it seems not possible. Do you know if a function exists that calculates marginal effects with imputed data?
Thank you!
...
0
votes
1
answer
213
views
Missing data in Structural Equation Modeling
I am new to the topic of SEM, so my question may seem a bit naive.
I have about 60 observed variables that will be grouped in some latent variables to explain one outcome measure.
One of the variables ...
1
vote
1
answer
759
views
Multiple imputation and normality assumption
I am reading the an online book by Stef Van Buuren (link at bottom) regarding multiple imputation. In Section 3.2.1 he lists 4 different approaches to multiple imputation:
Later on in Section 3.3 he ...
1
vote
1
answer
46
views
Imputation approaches for records with completely missing dimensions
I have two sets of data provided by the government - one spans the years 2016-2020, while the other only covers 2018-2020. Data from each dataset, for each year, are used to predict some outcome in ...
0
votes
1
answer
617
views
How to present results with Missing values?
When a data set contains a fraction of missing values - Which strategy should be chosen:
first impute data before providing population discriptives, or
give the population insights and cross-tabs ...
0
votes
0
answers
554
views
Is the pooled AUC calculation for imputated data in (psfmi package) mivalext_lr() correct?
I have an imputated data with several nonmissing and not-imputated variables. However, I realised when I use mivalext_lr() to obtain pooled AUC and 95% CI of my ...
0
votes
1
answer
87
views
Multiple imputation in prognostic studies
I have 2 questions about Multiple imputation (MI) in the assessment of the prognostic performance of a test. This test acts as a predictor of a specific outcome, 3 years in the future. I have 26 % of ...
0
votes
0
answers
47
views
Missing data roughly proportional to the clusters, does this indicate MAR?
I have data in which the number of missing values per cluster (in this case, zip-code), are proportional to the population. Does this indicate Missing at Random (MAR)?
Third column with missing ...
5
votes
0
answers
2k
views
R plm cluster robust standard errors with multiple imputations
I am looking for a way to implement (country) clustered standard errors on a panel regression with individual fixed effects. That is, in plm() I want to define some ...
2
votes
1
answer
338
views
How to use multiple imputed data for survey estimation?
I'm trying to calculate population mean, median, (etc, descriptive analysis) using multiple imputed data. However, the example that I found in sources were regression and then pool them into one ...
2
votes
2
answers
615
views
When is it OK not to keep a testing/holdout set?
I am performing data imputation on a large matrix [100000,34] of past measurements that contains missing values (rows are time-steps and columns are stations).
So far I've used several machine-...
0
votes
0
answers
5k
views
How to check whether the missing data is MCAR, MAR, or MNAR? [duplicate]
I read few responses close to the question and was suggested in using t-test or chi-sq test. However, the pattern between variables can also involve more than 2 variables (e.g. data at x tend to be ...
10
votes
1
answer
11k
views
Rubin's rule from scratch for multiple imputations
I have multiple set of imputations generated from multiple instances of random forest (such that the predictors are all the variables except the one column to impute). I was referred to Rubin's rule ...