Questions tagged [multiple-imputation]
Use this tag for questions involving multiple imputation, which refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data.
557 questions
2
votes
0
answers
135
views
Variable selection in multiply imputed data
I have a dataset with approximately 1800 observations and I'm trying to fit a multivariable logistic regression model (250 cases, 1550 controls). There are 19 covariates (mix of continuous, ordinal ...
2
votes
0
answers
65
views
Multiple imputations generate values distributed differently from original dataset... does this mean my data is MNAR? Imputations still usable?
Quick question. I'm using the mice R package to impute missing data. I go by the presumption that the missing data are MAR, but I wouldn't be surprised if a few binary variables were MNAR. I followed ...
7
votes
2
answers
854
views
Multiple imputation of binary endpoint using underlying continuous variable
I have a response variable (Yes/No) by visit with some missing values. I am considering imputing the underlying continuous variable in SAS using proc MI. After this process, I will have, let's say, M ...
0
votes
0
answers
79
views
What is the best method for imputing binary (or integral/count) data?
I have a longitudinal dataset, and I want to create a composite score by including five healthy lifestyles to measure the overall lifestyle over time (use as predictor). Each lifestyle is a binary ...
2
votes
1
answer
153
views
How to use priors to impute values at an individual level and replicate a distribution of the population?
I am trying to correct a variable from a survey that has measurement error. To do this, I have been taking this column as if it was missing and imputing new values based on the predictions of an ...
2
votes
3
answers
111
views
Would it be preferable to use statistical imputation instead of a subject matter expert's subjective estimate for missing data?
I'm working on a project where I need a variable for the total number of medications a patient is on. The PI is a clinician and I feel they would be able to use the resources at hand - case note and ...
0
votes
0
answers
317
views
how to run mediation analysis after multiple imputation in R?
I have run a multiple imputation model and after it I used the with() and pool() functions to pool all my results using linear regression to get one estimate.
I want to run mediation analysis on this ...
1
vote
1
answer
635
views
How to combine results from Wilcoxon Rank Sum Test for multiple imputed data sets from proc MI in SAS
Endpoint information:
We have seizure count collected for every day and therefore there will be some missing for some days.
We got average seizure frequency per 28-day, for an interval. That is, (...
0
votes
2
answers
372
views
how to determine which imputed data to use in R
I have a dataset of almost 100 variables. These variables are Likert scale questions from 1 to 5 or 1 to 3.
I converted the variables that I wanted to impute to categorical variables. Then I used this ...
0
votes
1
answer
99
views
How can I perform an analysis of the NHIS imputed income variables?
I have downloaded five family income variables from https://nhis.ipums.org/nhis-action/variables/group?id=economic_income (INCPPOINT1, INCPPOINT2, INCPPOINT3, INCPPOINT4, INCPPOINT5) for they years ...
0
votes
0
answers
78
views
Multiple imputation - complicated variable selection in linked datasets
I attended the course on multiple imputations, where it was stated (to my understanding) that when imputing the missing data on some predictors we should use all variables that will be fitted in the ...
1
vote
0
answers
503
views
When should one set variables to 0 in the predictor matrix for multiple imputation?
Background. I am using multiple imputation using the "mice" package in R (https://cran.r-project.org/web/packages/mice/mice.pdf) to handle missing data in a large public dataset I am ...
2
votes
0
answers
173
views
Multilevel multiple imputation in practice using R [closed]
I'm currently involved in a project where I want to address missing data using multiple imputation. I'm using healthcare data in a longitudinal setting with 16 time points, where observations are ...
1
vote
0
answers
113
views
MICE for longitudinal data - shall I include both id and time variable for imputing outcome
The missing variable in my longitudinal data set is the outcome variable. I try to use mice in R to do multiple imputation. The final model is mixed effect model fitted by lmer. The data set contains ...
0
votes
0
answers
120
views
Pooling methods after multiple imputation
I am a biginner for multiple imputation. Now trying pool all the results, but wondering how to do so. I need to make a table for number of patietns in each categories, percentage, and OR and 95%CI for ...
1
vote
1
answer
512
views
Multiple imputation for subgroup analysis
Suppose I am interested in fitting a linear regression model as follows:
Y = a + b1 * age(continuous) + b2 * sex + b3 * income
This model will be run in both the whole sample and subgroups (defined by ...
1
vote
0
answers
344
views
Data contains missing values after multiple imputation using mice without logged events (i.e., no evidence for constant values or multicollinearity) [closed]
After the multiple imputation (pmm method) using the mice package, there are still missing values in my dataset (although the number of missing values was reduced).
I have checked that there was no ...
2
votes
2
answers
316
views
Using Multiple Imputation Techniques in Data Analysis
I recently worked with two different statisticians who both suggested different strategies for dealing with imputation of missing data. For the sake of this discussion, I'll call them Statistician A ...
1
vote
1
answer
162
views
Properly Incorporating MICE into Predictive Modeling
I am currently making my way through Harrell's Regression Modeling Strategies and Van Buuren's Flexible Imputation so that I can apply rigorous imputation methods in our workflows. On p 95 of ...
1
vote
1
answer
102
views
Using multiple imputation on components of a derived variable
I have a time-to-event variable, where the occurrence of event is determined by 5 numerical components measured at pre-specified timepoints. Missing values are observed for some components at some ...
0
votes
1
answer
262
views
Model performance with multiply imputed data
I would like to know how to do calibration plot with Hosmer-lemeshow test and AUC for ROC curve after multiple imputation in R. I build one prediction model and tried to do model performance but ...
1
vote
0
answers
69
views
Imputation method for missing values that are irrelevant
I have a data set $\mathbf X$, with around 20 predictors, which is a matrix of parameters of a surrogate model. For each observation $\mathbf i$ of $\mathbf X$, the surrogate model was trained to ...
0
votes
1
answer
130
views
Percentage of missing values for multiple imputation
I am running a planned missingness design to pilot some items for a questionnaire I am designing. Specifically, I want to test 80 items and every participant (N = 300+) receives a random 10-item ...
3
votes
1
answer
133
views
How can I pool Bayesian parameter estimates after multiple imputation?
After multiple imputation (imputed dataset = 20), I would like to conduct Bayesian Model Estimation with Adaptive Metropolis Hastings Sampling (amh) -- using the MCMC method.
How can I pool the ...
0
votes
0
answers
86
views
How to deal with missing value in dependent variable of prediction model?
I am trying to build a prediction model from longitudinal study after intervention. So after intervention, we followup patients 1,3, and 6 months later to see if they are cured or not. So dependent ...
3
votes
1
answer
950
views
Multiple Imputation in R package Amelia with wide vs. long data
I am imputing missing values in a longitudinal dataset using the Amelia package in R. Does it matter if I have the data in long format (with id, time, and value in each row) or in wide format (with id,...
0
votes
1
answer
136
views
MICE multiple imputation in R - imputation number
I'm running MICE for 100 imputations with big data (~600k rows). Due to storage restrictions at work (which I am not permitted to change), I can't save all 100 imputations in one go, and I'd hit ...
1
vote
0
answers
59
views
How to create one pooled datafile after (5) multiple imputations in order to fill in the missing values in SPSS? [closed]
I used the multiple imputation method to fill in my missing data points in a big dataset. My dataset now contains values for 5 imputations. I know there is an option to analyze with the pooled value ...
1
vote
1
answer
650
views
multiple imputation for prediction
I am doing some experimentation with multiple imputation (MI) for prediction, more specifically in the context of binary classification.
I'm doing this because there is not much to be found with ...
3
votes
1
answer
542
views
when working with missing data, what percentage of data is considered too much missing before implementing something like imputation?
I am looking for advice (do not have a specific example regarding data) but am wondering, when working with any dataset that is missing, at what point/percentage would you consider using something ...
3
votes
0
answers
92
views
How to pool estimates from multiply-imputed datasets with complex sampling designs?
Analysts often use Rubin's rule (RR) to obtain a pooled estimate of a popular quantity from multiple (imputed) datasets. While popular statistical software (such as the R ...
0
votes
1
answer
998
views
Is this way of pooling Kaplan-Meier estimates correct? Example made with R mice::pool_scalar
This is my data. It has no gaps in survival or the used predictor - for the sake simplicity in this example. I want to see, if multiply generated the same dataset will give - after pooling - the same ...
2
votes
1
answer
990
views
Use the target variable during imputation?
There is a quite old yet very good question about the proper way for using rfImpute but to me the question raised by Doug7 (whether the target variable y gets used for the imputation of the features ...
2
votes
0
answers
60
views
Theoretical Results for MICE Imputation
Is there any literature exploring convergence guarantees of the MICE imputation method for missing data? In practice, the method seems to work pretty reliably with different regressor but I can't seem ...
1
vote
1
answer
833
views
Should you impute missing event and time-to-event variables for survival data that has missing and censored data?
I am working on a database that looks at progression-free survival and includes event and time-to-event data. It is missing about 40% of both time-to-event data and event data. I am wondering if I ...
0
votes
1
answer
301
views
Generalized linear model after multiple imputation on survey data
I am using a generalized linear mixed model after multiple imputation on survey data. However, after performing the analysis, I cannot extract random effects and confidence intervals for the estimates....
0
votes
2
answers
2k
views
Cox Regression on multiple imputed datasets (R)
I fit a cox regression using the coxph function of the survival package. Now I wanted to do the same on a multiple imputed data set (which I already have, generated in another software). I found some ...
8
votes
2
answers
282
views
Choosing $m$ value when using multiple imputation (MI)
Multiple imputation creates $m$ new imputed datasets by taking each missing value and replacing it by analyzing the $m$ imputed values (for example: using the mean). Is there a rule of thumb or a ...
2
votes
0
answers
42
views
Can be Rubin's pooling method (multiple imputation) be combined with Kenward-Roger or Satterthwaite degrees of freedom?
I would like to use multiple imputation algorithm with a Generalized Least Square with Kenward-Roger or Satterthwaite degrees of freedom. Does the commonly implemented Rubin's method account for those ...
0
votes
0
answers
183
views
Is there a way to fit a machine learning model to MICE imputed datasets and pool the results?
I have a medical dataset that has a lot of missing values. I imputed five datasets using MICE in R. I want to fit a classification machine learning model to the dataset. I want to identify the most ...
0
votes
0
answers
83
views
Data Imputation Without Ground-truth
recently I am dealing with a project of data imputation. I use the probabilistic imputation (multiple imputation) methods. As is known, the real data do not contain the REAL values for the missing ...
5
votes
1
answer
204
views
Is there a way to impute chi-square data?
I've tried looking here, as well as the go-to book Flexible Imputation of Missing Data, but cannot seem to find any reliable information on how to simulate chi-square missingness (as well as imputing ...
0
votes
0
answers
551
views
Multiple logistic regression odds ratio on multiply imputed data in R
I am running a hierarchical logistic regression analysis using multiply imputed data in R (using the mice and miceafter packages). I am unable to get the odds ratio and 95% CI per variable adjusted ...
1
vote
1
answer
386
views
Which variables could I impute using the multiple imputation?
Suppose I have a dataset, and I want to use it to analyse the association between BMI and stroke. The dataset has some missingness for BMI(independent variable) and some missingness for covariates ...
1
vote
0
answers
322
views
Mice imputation with a small number of missing values - test/train set may have no missing values
When performing resampling, I wish to perform the same imputation on the test set as I performed on the training set, which is accepted practice. So, when imputing with MICE, I generate a predictor ...
1
vote
0
answers
113
views
To what extent does imputed data on item level have to respect the range of plausible values when one is interested in the aggregated scores? (MICE)
I’m aiming to impute data on Likert-Scale item level for a nested dataset using the MICE package.
The data is nested in the sense that participants (>2000) belong to different clusters (around 100 ...
2
votes
1
answer
567
views
What is the limit of missing values for multiple imputation in the mice package?
I have two questions about the mice package.
The first, is the mincor in the quickpred argument. When on the cran it says it is the absolute minimum correlation compared. Does this mean that if I set ...
1
vote
1
answer
576
views
Interpolate / Impute time series (sparse measurements)
I observed a manufacturing process that yielded ~40,000 parts
I sampled 200 of these parts (every 200th part) and measured their properties
My ultimate goal is to show that sensor data, that ...
3
votes
1
answer
432
views
Combining multiple imputation and survey non-response adjustments (IPW)
Imagine the following scenario:
A population cohort (assume no or equal sampling weights) of say 10000 people had various demographics and health factors measured at baseline $X_{base}$(with some ...
3
votes
0
answers
81
views
Mice package for imputation - chains not intermingling
I'm running an imputation using the mice package in R (imputing 7 variables with missing values on the basis of 10 total variables). The imputation runs fine, and ...