Skip to main content

Questions tagged [multiple-imputation]

Use this tag for questions involving multiple imputation, which refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data.

Filter by
Sorted by
Tagged with
2 votes
0 answers
135 views

I have a dataset with approximately 1800 observations and I'm trying to fit a multivariable logistic regression model (250 cases, 1550 controls). There are 19 covariates (mix of continuous, ordinal ...
donm79's user avatar
  • 51
2 votes
0 answers
65 views

Quick question. I'm using the mice R package to impute missing data. I go by the presumption that the missing data are MAR, but I wouldn't be surprised if a few binary variables were MNAR. I followed ...
awastus's user avatar
  • 61
7 votes
2 answers
854 views

I have a response variable (Yes/No) by visit with some missing values. I am considering imputing the underlying continuous variable in SAS using proc MI. After this process, I will have, let's say, M ...
Kate's user avatar
  • 347
0 votes
0 answers
79 views

I have a longitudinal dataset, and I want to create a composite score by including five healthy lifestyles to measure the overall lifestyle over time (use as predictor). Each lifestyle is a binary ...
zjppdozen's user avatar
  • 543
2 votes
1 answer
153 views

I am trying to correct a variable from a survey that has measurement error. To do this, I have been taking this column as if it was missing and imputing new values based on the predictions of an ...
Santiago Valdivieso's user avatar
2 votes
3 answers
111 views

I'm working on a project where I need a variable for the total number of medications a patient is on. The PI is a clinician and I feel they would be able to use the resources at hand - case note and ...
Geoff's user avatar
  • 863
0 votes
0 answers
317 views

I have run a multiple imputation model and after it I used the with() and pool() functions to pool all my results using linear regression to get one estimate. I want to run mediation analysis on this ...
yusefsoliman's user avatar
1 vote
1 answer
635 views

Endpoint information: We have seizure count collected for every day and therefore there will be some missing for some days. We got average seizure frequency per 28-day, for an interval. That is, (...
Janet Xu's user avatar
0 votes
2 answers
372 views

I have a dataset of almost 100 variables. These variables are Likert scale questions from 1 to 5 or 1 to 3. I converted the variables that I wanted to impute to categorical variables. Then I used this ...
yusefsoliman's user avatar
0 votes
1 answer
99 views

I have downloaded five family income variables from https://nhis.ipums.org/nhis-action/variables/group?id=economic_income (INCPPOINT1, INCPPOINT2, INCPPOINT3, INCPPOINT4, INCPPOINT5) for they years ...
tryingtogetsmth's user avatar
0 votes
0 answers
78 views

I attended the course on multiple imputations, where it was stated (to my understanding) that when imputing the missing data on some predictors we should use all variables that will be fitted in the ...
Milo's user avatar
  • 327
1 vote
0 answers
503 views

Background. I am using multiple imputation using the "mice" package in R (https://cran.r-project.org/web/packages/mice/mice.pdf) to handle missing data in a large public dataset I am ...
jumbo-owl's user avatar
2 votes
0 answers
173 views

I'm currently involved in a project where I want to address missing data using multiple imputation. I'm using healthcare data in a longitudinal setting with 16 time points, where observations are ...
Malik's user avatar
  • 218
1 vote
0 answers
113 views

The missing variable in my longitudinal data set is the outcome variable. I try to use mice in R to do multiple imputation. The final model is mixed effect model fitted by lmer. The data set contains ...
Charlotte's user avatar
0 votes
0 answers
120 views

I am a biginner for multiple imputation. Now trying pool all the results, but wondering how to do so. I need to make a table for number of patietns in each categories, percentage, and OR and 95%CI for ...
Haruka Hayashi's user avatar
1 vote
1 answer
512 views

Suppose I am interested in fitting a linear regression model as follows: Y = a + b1 * age(continuous) + b2 * sex + b3 * income This model will be run in both the whole sample and subgroups (defined by ...
Willi Zhang's user avatar
1 vote
0 answers
344 views

After the multiple imputation (pmm method) using the mice package, there are still missing values in my dataset (although the number of missing values was reduced). I have checked that there was no ...
NessD's user avatar
  • 191
2 votes
2 answers
316 views

I recently worked with two different statisticians who both suggested different strategies for dealing with imputation of missing data. For the sake of this discussion, I'll call them Statistician A ...
alliecat966's user avatar
1 vote
1 answer
162 views

I am currently making my way through Harrell's Regression Modeling Strategies and Van Buuren's Flexible Imputation so that I can apply rigorous imputation methods in our workflows. On p 95 of ...
user7351362's user avatar
1 vote
1 answer
102 views

I have a time-to-event variable, where the occurrence of event is determined by 5 numerical components measured at pre-specified timepoints. Missing values are observed for some components at some ...
Will_Zhang's user avatar
0 votes
1 answer
262 views

I would like to know how to do calibration plot with Hosmer-lemeshow test and AUC for ROC curve after multiple imputation in R. I build one prediction model and tried to do model performance but ...
Haruka Hayashi's user avatar
1 vote
0 answers
69 views

I have a data set $\mathbf X$, with around 20 predictors, which is a matrix of parameters of a surrogate model. For each observation $\mathbf i$ of $\mathbf X$, the surrogate model was trained to ...
Florent H's user avatar
  • 153
0 votes
1 answer
130 views

I am running a planned missingness design to pilot some items for a questionnaire I am designing. Specifically, I want to test 80 items and every participant (N = 300+) receives a random 10-item ...
Felix's user avatar
  • 65
3 votes
1 answer
133 views

After multiple imputation (imputed dataset = 20), I would like to conduct Bayesian Model Estimation with Adaptive Metropolis Hastings Sampling (amh) -- using the MCMC method. How can I pool the ...
conner's user avatar
  • 31
0 votes
0 answers
86 views

I am trying to build a prediction model from longitudinal study after intervention. So after intervention, we followup patients 1,3, and 6 months later to see if they are cured or not. So dependent ...
Haruka Hayashi's user avatar
3 votes
1 answer
950 views

I am imputing missing values in a longitudinal dataset using the Amelia package in R. Does it matter if I have the data in long format (with id, time, and value in each row) or in wide format (with id,...
Benji's user avatar
  • 423
0 votes
1 answer
136 views

I'm running MICE for 100 imputations with big data (~600k rows). Due to storage restrictions at work (which I am not permitted to change), I can't save all 100 imputations in one go, and I'd hit ...
MICE man's user avatar
1 vote
0 answers
59 views

I used the multiple imputation method to fill in my missing data points in a big dataset. My dataset now contains values for 5 imputations. I know there is an option to analyze with the pooled value ...
Anne's user avatar
  • 11
1 vote
1 answer
650 views

I am doing some experimentation with multiple imputation (MI) for prediction, more specifically in the context of binary classification. I'm doing this because there is not much to be found with ...
cliffhanger-be's user avatar
3 votes
1 answer
542 views

I am looking for advice (do not have a specific example regarding data) but am wondering, when working with any dataset that is missing, at what point/percentage would you consider using something ...
ineedhelp's user avatar
  • 409
3 votes
0 answers
92 views

Analysts often use Rubin's rule (RR) to obtain a pooled estimate of a popular quantity from multiple (imputed) datasets. While popular statistical software (such as the R ...
socialscientist's user avatar
0 votes
1 answer
998 views

This is my data. It has no gaps in survival or the used predictor - for the sake simplicity in this example. I want to see, if multiply generated the same dataset will give - after pooling - the same ...
GeneralizedLM's user avatar
2 votes
1 answer
990 views

There is a quite old yet very good question about the proper way for using rfImpute but to me the question raised by Doug7 (whether the target variable y gets used for the imputation of the features ...
MarkH's user avatar
  • 207
2 votes
0 answers
60 views

Is there any literature exploring convergence guarantees of the MICE imputation method for missing data? In practice, the method seems to work pretty reliably with different regressor but I can't seem ...
Doc Stories's user avatar
1 vote
1 answer
833 views

I am working on a database that looks at progression-free survival and includes event and time-to-event data. It is missing about 40% of both time-to-event data and event data. I am wondering if I ...
mepstein1218's user avatar
0 votes
1 answer
301 views

I am using a generalized linear mixed model after multiple imputation on survey data. However, after performing the analysis, I cannot extract random effects and confidence intervals for the estimates....
Flad's user avatar
  • 1
0 votes
2 answers
2k views

I fit a cox regression using the coxph function of the survival package. Now I wanted to do the same on a multiple imputed data set (which I already have, generated in another software). I found some ...
Sebastian's user avatar
  • 133
8 votes
2 answers
282 views

Multiple imputation creates $m$ new imputed datasets by taking each missing value and replacing it by analyzing the $m$ imputed values (for example: using the mean). Is there a rule of thumb or a ...
Amit S's user avatar
  • 77
2 votes
0 answers
42 views

I would like to use multiple imputation algorithm with a Generalized Least Square with Kenward-Roger or Satterthwaite degrees of freedom. Does the commonly implemented Rubin's method account for those ...
Nikaraguien's user avatar
0 votes
0 answers
183 views

I have a medical dataset that has a lot of missing values. I imputed five datasets using MICE in R. I want to fit a classification machine learning model to the dataset. I want to identify the most ...
Just a stat student's user avatar
0 votes
0 answers
83 views

recently I am dealing with a project of data imputation. I use the probabilistic imputation (multiple imputation) methods. As is known, the real data do not contain the REAL values for the missing ...
stander Qiu's user avatar
5 votes
1 answer
204 views

I've tried looking here, as well as the go-to book Flexible Imputation of Missing Data, but cannot seem to find any reliable information on how to simulate chi-square missingness (as well as imputing ...
Shawn Hemelstrand's user avatar
0 votes
0 answers
551 views

I am running a hierarchical logistic regression analysis using multiply imputed data in R (using the mice and miceafter packages). I am unable to get the odds ratio and 95% CI per variable adjusted ...
Mona's user avatar
  • 1
1 vote
1 answer
386 views

Suppose I have a dataset, and I want to use it to analyse the association between BMI and stroke. The dataset has some missingness for BMI(independent variable) and some missingness for covariates ...
li jiaqi's user avatar
1 vote
0 answers
322 views

When performing resampling, I wish to perform the same imputation on the test set as I performed on the training set, which is accepted practice. So, when imputing with MICE, I generate a predictor ...
panda's user avatar
  • 121
1 vote
0 answers
113 views

I’m aiming to impute data on Likert-Scale item level for a nested dataset using the MICE package. The data is nested in the sense that participants (>2000) belong to different clusters (around 100 ...
Rasul89's user avatar
  • 163
2 votes
1 answer
567 views

I have two questions about the mice package. The first, is the mincor in the quickpred argument. When on the cran it says it is the absolute minimum correlation compared. Does this mean that if I set ...
Kledson Lemes's user avatar
1 vote
1 answer
576 views

I observed a manufacturing process that yielded ~40,000 parts I sampled 200 of these parts (every 200th part) and measured their properties My ultimate goal is to show that sensor data, that ...
derkurt's user avatar
  • 11
3 votes
1 answer
432 views

Imagine the following scenario: A population cohort (assume no or equal sampling weights) of say 10000 people had various demographics and health factors measured at baseline $X_{base}$(with some ...
nstjhp's user avatar
  • 135
3 votes
0 answers
81 views

I'm running an imputation using the mice package in R (imputing 7 variables with missing values on the basis of 10 total variables). The imputation runs fine, and ...
Henry Brice's user avatar

1
2
3 4 5
12