Questions tagged [multiple-imputation]
Use this tag for questions involving multiple imputation, which refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data.
200 questions with no upvoted or accepted answers
9
votes
0
answers
238
views
Generalization of degrees of freedom for t distribution for coefficients after multiple imputation
Donald Rubin has shown that regression coefficient estimates have fatter tails after multiple imputation and has provided a formula for the degrees of freedom to use as a t-distribution approximation ...
5
votes
0
answers
2k
views
R plm cluster robust standard errors with multiple imputations
I am looking for a way to implement (country) clustered standard errors on a panel regression with individual fixed effects. That is, in plm() I want to define some ...
5
votes
0
answers
688
views
Multicollinearity in structural equation modeling with multiple imputation?
Using R, I created a structural equation model and fit it to multiple datasets using the 'sem.mi()' function from the SemTools package. I know multicollinearity tends to be a concern for structural ...
5
votes
0
answers
1k
views
Descriptive statistics (frequencies, counts, proportions) after multiple imputation
I recently ran a multiple imputation using the mice package in R to generated imputed datasets. I have no problems with running inferential statistics on the pooled data (logistic and Cox regressions) ...
5
votes
0
answers
470
views
How to do multiple imputation for spatial models?
I'm trying to estimate various spatial models such as spatial autoregressive regression (SAR), Spatial Durbin Model (SDM), and Spatial Error Model (SEM) but have missing data throughout my variables. ...
5
votes
0
answers
666
views
Permuting the formula argument to Hmisc:aregImpute
In Frank Harrell's RMS Short Course today, I became aware that multiple imputation with Hmisc:aregImpute is not invariant to the ordering of terms in its formula ...
5
votes
0
answers
1k
views
Multiple imputation of time variables -- which step to impute?
Lets assume I have a survival analysis study with an exposure, two covariates, and two time related variables. Say date of diagnosis and date of death. Combined, the two time related variables will be ...
4
votes
0
answers
522
views
Combining Gradient boosted trees after multiple imputation
Currently I am working with a gradient boosted tree model fit onto a multiple imputed dataset.
For those who don't know multiple imputation:
It predicts missing values and imputes that value with ...
4
votes
0
answers
664
views
Multiple imputation of glm binomial size parameter
Suppose we have a generalized linear model with a binomial response $y_i\sim \mathrm{bin}(n_i,p_i)$ where $p_i$ is determined by the linear predictor in the usual way via some link function. Is there ...
4
votes
0
answers
3k
views
perform quality check for imputed data with MICE in R
I'm currently working with the MICE algorithm to impute missing data.
After I did the imputation I wanted to do some kind of quality check of the imputed data set.
There are some suggestions here
...
4
votes
0
answers
1k
views
Checking Cox model assumptions with multiple imputation
I have run multiple imputation using MICE. I would now like to run a Cox model on it (using with,pool), and make sure that is justified. That is, I need to make sure that the proportional hazards ...
4
votes
0
answers
838
views
Imputation with mice: recode variables before or after imputation?
I am using mice in R, a chained equations (sequential regression) algorithm, to impute a series of polytomous variables (e.g. ...
3
votes
0
answers
92
views
How to pool estimates from multiply-imputed datasets with complex sampling designs?
Analysts often use Rubin's rule (RR) to obtain a pooled estimate of a popular quantity from multiple (imputed) datasets. While popular statistical software (such as the R ...
3
votes
0
answers
81
views
Mice package for imputation - chains not intermingling
I'm running an imputation using the mice package in R (imputing 7 variables with missing values on the basis of 10 total variables). The imputation runs fine, and ...
3
votes
0
answers
113
views
Theory behind Multivariate Imputation with Chained Equations
Can anyone provide a reference to the theory that supports multivariate imputation with chained equations (MICE). I know Rubin has provided this for MI but MICE is a Gibbs sampler (I have never seen ...
3
votes
0
answers
539
views
How to use MICE in R to fill missing values in test set?
It seems that MICE does not have a "predict" function which allows to use a fitted mids object to predict the missing values in test data set. I can certainly ...
3
votes
1
answer
268
views
Validity of tobit estimates after multiple imputation
I want to estimate tobit marginal effects using multiply imputed data, however I see that tobit is not among the estimation commands supported by Stata's MI prefix - I understand that the validity of ...
3
votes
0
answers
2k
views
How to use pooled results from multiple imputation?
I've been reading some posts about data imputation using multiple imputation, specifically the MICE R package. I get the main idea of creating multiple datasets with imputed data. The part that is not ...
3
votes
0
answers
922
views
General practice to impute missing values
There are multiple resources and answers on type of imputations and packages that can help in imputing the missing values or how to use a particular package. But there are little to no resources ...
3
votes
0
answers
120
views
Can someone give me an intuition of congeniality in multiple imputation?
As the title says. I read a lot about congeniality of Bayesian models (e.g. Meng, 1994) and I do know some definitions, but I don't feel I can get grip on what happens when models are congenial or ...
3
votes
0
answers
89
views
What statistical models / approaches can I use to estimate missing hourly values?
My dataset consists of hourly values by weekday across several sites, where the sites vary by spatial location and by other common characteristics, such as type, or 'cafe,' 'restaurant,' and 'bar.'
...
3
votes
0
answers
47
views
Imputation of a (weird) multivariate outcome
I am working with a dataset in which the outcome of interest is a vector of dates of particular events: (date_1,date_2,date_3,...,date_n). Some of these outcome vectors are completely missing, but I ...
3
votes
0
answers
549
views
How to generate a longitudinal binary data with missing at random (MAR)?
I want to test the performance of a multiple imputation algorithm for longitudinal binary data. Right now I have applied the algorithm on some real data sets and it turned out promising and then I ...
3
votes
0
answers
539
views
Multiple Imputation and Matrix Completion
It is quite common that data sets will contain missing values in them. Suppose we want to try to fill in the missing values. For this we have techniques such as single/multiple imputation and matrix ...
3
votes
0
answers
740
views
How to compare and validate imputation models?
I've seen a lot of interesting questions here about multiple imputation and also great answers that helped me a lot to impute my data. I've used Predictive Mean Matching, EMB and I would like to use ...
2
votes
0
answers
53
views
Is a MICE approach to multivariable imputation well controlled for survival data?
When imputing data by an algorithm such as "mice", it occurs to me that the algorithm takes no account of the structure or representation of a survival outcome which is stored as an event ...
2
votes
0
answers
135
views
Variable selection in multiply imputed data
I have a dataset with approximately 1800 observations and I'm trying to fit a multivariable logistic regression model (250 cases, 1550 controls). There are 19 covariates (mix of continuous, ordinal ...
2
votes
0
answers
65
views
Multiple imputations generate values distributed differently from original dataset... does this mean my data is MNAR? Imputations still usable?
Quick question. I'm using the mice R package to impute missing data. I go by the presumption that the missing data are MAR, but I wouldn't be surprised if a few binary variables were MNAR. I followed ...
2
votes
0
answers
60
views
Theoretical Results for MICE Imputation
Is there any literature exploring convergence guarantees of the MICE imputation method for missing data? In practice, the method seems to work pretty reliably with different regressor but I can't seem ...
2
votes
0
answers
42
views
Can be Rubin's pooling method (multiple imputation) be combined with Kenward-Roger or Satterthwaite degrees of freedom?
I would like to use multiple imputation algorithm with a Generalized Least Square with Kenward-Roger or Satterthwaite degrees of freedom. Does the commonly implemented Rubin's method account for those ...
2
votes
1
answer
567
views
What is the limit of missing values for multiple imputation in the mice package?
I have two questions about the mice package.
The first, is the mincor in the quickpred argument. When on the cran it says it is the absolute minimum correlation compared. Does this mean that if I set ...
2
votes
0
answers
191
views
How to implement Rubin's Rules to assess model fit on imputed test data with continuous outcome? (e.g. RMSE and 95% CI)
I'm working on a project now which involves the use of multiple imputation while developing machine learning models (using a training/test split, ~7000 observations total) for a continuous outcome. I ...
2
votes
1
answer
292
views
Partial eta squared calculation with multiple imputation data
I have 10 multiple imputation datasets ($N = 97$, two groups) and am running ANCOVA (controlling for pre-test values) to look at post-test group differences. Working in SPSS and can't really invest ...
2
votes
0
answers
471
views
Combining random forest variable importance p-values from multiply imputed datasets
I am using the ranger package in R to construct random forests on 10 imputed datasets after implementing MICE to fill in missing values. The ranger package provides not only a variable importance ...
2
votes
0
answers
44
views
Imputation that takes into account both relationships among variables and spatial adjacency?
I have a dataset with 13 variables and 50 observations representing the U.S. states. The variables represent the land use intensity of different agricultural industries in each state. Of those 650 ...
2
votes
0
answers
456
views
Interpolation versus imputation for time series on chemical profiles of water wells
So I am working with some data on water wells and time series of chemical pollutant tests on those wells. There are 10 chemicals and 10 years in the data. My goal is to do some clustering on the wells ...
2
votes
0
answers
147
views
Obtaining measures of effect for contingency tables with multiply imputed data
The epi.2by2 function in the epiR package computes a chi-square test and provides measures of effect when count data are ...
2
votes
0
answers
183
views
How to test multiple regression assumptions when multiple imputation has been used?
I used multiple imputation on SPSS to deal with missing data in my study. I then carried out multiple regression from the imputed and original data-sets, using a split-file. I now have output for each ...
2
votes
0
answers
74
views
Choosing Among Multiply Imputed Datasets
I am using multiple imputation to estimate treatment effects in a dataset that contains missing data. In some of my imputed datasets, the algorithm used in the analysis fails to converge; it's not ...
2
votes
0
answers
74
views
How to deal with undetectable outcome values? (data missing not at random)
I conducted a sound propagation experiment in which recorded maned wolves calls were broadcasted at different sites(x3), hours(x6: 17h,18h,23h,05h,06h,11h), and with different speaker position (x2: ...
2
votes
0
answers
143
views
To impute or not - community consensus for reporting accuracy of an imputed model
I have a model generated using an imputed data set with imputation accuracy of 75%.
If the model using imputed data has an accuracy of 80%
What would be the community consensus to report the ...
2
votes
0
answers
635
views
Compute the power of Tukey's Honest Significance Difference (or cognates)?
I've got a simple computational model I can run experiments with. Experiments are "free" but I don't want to run it more times than necessary because it still takes time.
All the simulation use the ...
2
votes
0
answers
814
views
Analyzing Multiply imputed datasets rich in categorical data
My original dataset with 48 subjects has a considerable number of missing items. Majority of the data is categorical (dichotomous) and some of it is ranked (ordinal). I performed a multiple imputation ...
2
votes
0
answers
444
views
Multiple imputation of time-varying dataset for Cox model in R
I've seen others searching for similar issues, but have not yet come across a example that explains how to actually do this:
I have a dataset with both time varying and non-time varying variables ...
2
votes
0
answers
120
views
Method for predicting price based on Geographical market, Product, and Company
I have a dataset which tracks the prices of 21 products, charged by 24 companies, in 150 different cities across the globe. However, the data set has missing values--that is, I might have Company X's ...
2
votes
0
answers
134
views
dividing a multiply imputed dataset into derivation and validation cohorts
R/statistics noob. Mac OSX 10.11, RStudio 0.99.842.
I'm developing a clinical prediction tool as part of my PhD. I have missing data (23k cases, 24 variables, 70% of variables have at least one ...
2
votes
0
answers
305
views
Unequal timepoints longitudinal data with missing values
I have a longitudinal data with unequal time points with missing values. I am looking for methods to impute the missing data. I looked at R packages NORM and AMELIA II and SAS procedures PROC MI. All ...
2
votes
0
answers
1k
views
how to remove outliers prior to multiple imputation
A colleague came to me with the following problem. She has a complex, multivariate data set, in which respondents completed a number of measures with anywhere from 6 to 30 Likert type items for each ...
2
votes
0
answers
307
views
Using entropy to imputing missing value based on grey relational analysis and clustering
This algorithm contain three techniques :
1-fuzzy c-mean clustering
2-Grey relational theory
3-Entropy multiple imputation
The frame work of this algorithm is as follows :
My questions are about the ...
2
votes
0
answers
262
views
Multiply imputing data, but using just one of the imputed data sets
All,
I have a question about what's practical when it comes to presenting results of multiply imputed data. I'm well-versed on the difference among MCAR/MAR/MNAR and approaches to imputing the data ...