Skip to main content

Questions tagged [multiple-imputation]

Use this tag for questions involving multiple imputation, which refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data.

200 questions with no upvoted or accepted answers
Filter by
Sorted by
Tagged with
9 votes
0 answers
238 views

Donald Rubin has shown that regression coefficient estimates have fatter tails after multiple imputation and has provided a formula for the degrees of freedom to use as a t-distribution approximation ...
Frank Harrell's user avatar
5 votes
0 answers
2k views

I am looking for a way to implement (country) clustered standard errors on a panel regression with individual fixed effects. That is, in plm() I want to define some ...
Andreas Chmielowski's user avatar
5 votes
0 answers
688 views

Using R, I created a structural equation model and fit it to multiple datasets using the 'sem.mi()' function from the SemTools package. I know multicollinearity tends to be a concern for structural ...
poe's user avatar
  • 111
5 votes
0 answers
1k views

I recently ran a multiple imputation using the mice package in R to generated imputed datasets. I have no problems with running inferential statistics on the pooled data (logistic and Cox regressions) ...
somesurgeon's user avatar
5 votes
0 answers
470 views

I'm trying to estimate various spatial models such as spatial autoregressive regression (SAR), Spatial Durbin Model (SDM), and Spatial Error Model (SEM) but have missing data throughout my variables. ...
LJB's user avatar
  • 211
5 votes
0 answers
666 views

In Frank Harrell's RMS Short Course today, I became aware that multiple imputation with Hmisc:aregImpute is not invariant to the ordering of terms in its formula ...
David C. Norris's user avatar
5 votes
0 answers
1k views

Lets assume I have a survival analysis study with an exposure, two covariates, and two time related variables. Say date of diagnosis and date of death. Combined, the two time related variables will be ...
Fomite's user avatar
  • 24.8k
4 votes
0 answers
522 views

Currently I am working with a gradient boosted tree model fit onto a multiple imputed dataset. For those who don't know multiple imputation: It predicts missing values and imputes that value with ...
SK4ndal's user avatar
  • 81
4 votes
0 answers
664 views

Suppose we have a generalized linear model with a binomial response $y_i\sim \mathrm{bin}(n_i,p_i)$ where $p_i$ is determined by the linear predictor in the usual way via some link function. Is there ...
Jarle Tufto's user avatar
  • 13.2k
4 votes
0 answers
3k views

I'm currently working with the MICE algorithm to impute missing data. After I did the imputation I wanted to do some kind of quality check of the imputed data set. There are some suggestions here ...
ching's user avatar
  • 807
4 votes
0 answers
1k views

I have run multiple imputation using MICE. I would now like to run a Cox model on it (using with,pool), and make sure that is justified. That is, I need to make sure that the proportional hazards ...
RayVelcoro's user avatar
  • 1,277
4 votes
0 answers
838 views

I am using mice in R, a chained equations (sequential regression) algorithm, to impute a series of polytomous variables (e.g. ...
tomka's user avatar
  • 7,004
3 votes
0 answers
92 views

Analysts often use Rubin's rule (RR) to obtain a pooled estimate of a popular quantity from multiple (imputed) datasets. While popular statistical software (such as the R ...
socialscientist's user avatar
3 votes
0 answers
81 views

I'm running an imputation using the mice package in R (imputing 7 variables with missing values on the basis of 10 total variables). The imputation runs fine, and ...
Henry Brice's user avatar
3 votes
0 answers
113 views

Can anyone provide a reference to the theory that supports multivariate imputation with chained equations (MICE). I know Rubin has provided this for MI but MICE is a Gibbs sampler (I have never seen ...
Robert's user avatar
  • 31
3 votes
0 answers
539 views

It seems that MICE does not have a "predict" function which allows to use a fitted mids object to predict the missing values in test data set. I can certainly ...
Catiger3331's user avatar
3 votes
1 answer
268 views

I want to estimate tobit marginal effects using multiply imputed data, however I see that tobit is not among the estimation commands supported by Stata's MI prefix - I understand that the validity of ...
MartinQLD's user avatar
  • 565
3 votes
0 answers
2k views

I've been reading some posts about data imputation using multiple imputation, specifically the MICE R package. I get the main idea of creating multiple datasets with imputed data. The part that is not ...
paipaipai's user avatar
3 votes
0 answers
922 views

There are multiple resources and answers on type of imputations and packages that can help in imputing the missing values or how to use a particular package. But there are little to no resources ...
Manraj Singh's user avatar
3 votes
0 answers
120 views

As the title says. I read a lot about congeniality of Bayesian models (e.g. Meng, 1994) and I do know some definitions, but I don't feel I can get grip on what happens when models are congenial or ...
Suzanne's user avatar
  • 31
3 votes
0 answers
89 views

My dataset consists of hourly values by weekday across several sites, where the sites vary by spatial location and by other common characteristics, such as type, or 'cafe,' 'restaurant,' and 'bar.' ...
gallygator's user avatar
3 votes
0 answers
47 views

I am working with a dataset in which the outcome of interest is a vector of dates of particular events: (date_1,date_2,date_3,...,date_n). Some of these outcome vectors are completely missing, but I ...
Plem's user avatar
  • 31
3 votes
0 answers
549 views

I want to test the performance of a multiple imputation algorithm for longitudinal binary data. Right now I have applied the algorithm on some real data sets and it turned out promising and then I ...
David Z's user avatar
  • 1,628
3 votes
0 answers
539 views

It is quite common that data sets will contain missing values in them. Suppose we want to try to fill in the missing values. For this we have techniques such as single/multiple imputation and matrix ...
GXR's user avatar
  • 31
3 votes
0 answers
740 views

I've seen a lot of interesting questions here about multiple imputation and also great answers that helped me a lot to impute my data. I've used Predictive Mean Matching, EMB and I would like to use ...
psoares's user avatar
  • 606
2 votes
0 answers
53 views

When imputing data by an algorithm such as "mice", it occurs to me that the algorithm takes no account of the structure or representation of a survival outcome which is stored as an event ...
AdamO's user avatar
  • 67.5k
2 votes
0 answers
135 views

I have a dataset with approximately 1800 observations and I'm trying to fit a multivariable logistic regression model (250 cases, 1550 controls). There are 19 covariates (mix of continuous, ordinal ...
donm79's user avatar
  • 51
2 votes
0 answers
65 views

Quick question. I'm using the mice R package to impute missing data. I go by the presumption that the missing data are MAR, but I wouldn't be surprised if a few binary variables were MNAR. I followed ...
awastus's user avatar
  • 61
2 votes
0 answers
60 views

Is there any literature exploring convergence guarantees of the MICE imputation method for missing data? In practice, the method seems to work pretty reliably with different regressor but I can't seem ...
Doc Stories's user avatar
2 votes
0 answers
42 views

I would like to use multiple imputation algorithm with a Generalized Least Square with Kenward-Roger or Satterthwaite degrees of freedom. Does the commonly implemented Rubin's method account for those ...
Nikaraguien's user avatar
2 votes
1 answer
567 views

I have two questions about the mice package. The first, is the mincor in the quickpred argument. When on the cran it says it is the absolute minimum correlation compared. Does this mean that if I set ...
Kledson Lemes's user avatar
2 votes
0 answers
191 views

I'm working on a project now which involves the use of multiple imputation while developing machine learning models (using a training/test split, ~7000 observations total) for a continuous outcome. I ...
NB3's user avatar
  • 25
2 votes
1 answer
292 views

I have 10 multiple imputation datasets ($N = 97$, two groups) and am running ANCOVA (controlling for pre-test values) to look at post-test group differences. Working in SPSS and can't really invest ...
Freddie's user avatar
  • 21
2 votes
0 answers
471 views

I am using the ranger package in R to construct random forests on 10 imputed datasets after implementing MICE to fill in missing values. The ranger package provides not only a variable importance ...
Geoffrey Kahn's user avatar
2 votes
0 answers
44 views

I have a dataset with 13 variables and 50 observations representing the U.S. states. The variables represent the land use intensity of different agricultural industries in each state. Of those 650 ...
qdread's user avatar
  • 449
2 votes
0 answers
456 views

So I am working with some data on water wells and time series of chemical pollutant tests on those wells. There are 10 chemicals and 10 years in the data. My goal is to do some clustering on the wells ...
krishnab's user avatar
  • 1,782
2 votes
0 answers
147 views

The epi.2by2 function in the epiR package computes a chi-square test and provides measures of effect when count data are ...
C_H's user avatar
  • 125
2 votes
0 answers
183 views

I used multiple imputation on SPSS to deal with missing data in my study. I then carried out multiple regression from the imputed and original data-sets, using a split-file. I now have output for each ...
Charlie Hart's user avatar
2 votes
0 answers
74 views

I am using multiple imputation to estimate treatment effects in a dataset that contains missing data. In some of my imputed datasets, the algorithm used in the analysis fails to converge; it's not ...
Noah's user avatar
  • 40.2k
2 votes
0 answers
74 views

I conducted a sound propagation experiment in which recorded maned wolves calls were broadcasted at different sites(x3), hours(x6: 17h,18h,23h,05h,06h,11h), and with different speaker position (x2: ...
Luane's user avatar
  • 21
2 votes
0 answers
143 views

I have a model generated using an imputed data set with imputation accuracy of 75%. If the model using imputed data has an accuracy of 80% What would be the community consensus to report the ...
Khader Shameer's user avatar
2 votes
0 answers
635 views

I've got a simple computational model I can run experiments with. Experiments are "free" but I don't want to run it more times than necessary because it still takes time. All the simulation use the ...
CarrKnight's user avatar
  • 1,108
2 votes
0 answers
814 views

My original dataset with 48 subjects has a considerable number of missing items. Majority of the data is categorical (dichotomous) and some of it is ranked (ordinal). I performed a multiple imputation ...
Kelvin Mogesa's user avatar
2 votes
0 answers
444 views

I've seen others searching for similar issues, but have not yet come across a example that explains how to actually do this: I have a dataset with both time varying and non-time varying variables ...
Simen Buodd's user avatar
2 votes
0 answers
120 views

I have a dataset which tracks the prices of 21 products, charged by 24 companies, in 150 different cities across the globe. However, the data set has missing values--that is, I might have Company X's ...
Sam's user avatar
  • 71
2 votes
0 answers
134 views

R/statistics noob. Mac OSX 10.11, RStudio 0.99.842. I'm developing a clinical prediction tool as part of my PhD. I have missing data (23k cases, 24 variables, 70% of variables have at least one ...
mike's user avatar
  • 21
2 votes
0 answers
305 views

I have a longitudinal data with unequal time points with missing values. I am looking for methods to impute the missing data. I looked at R packages NORM and AMELIA II and SAS procedures PROC MI. All ...
user24318's user avatar
  • 215
2 votes
0 answers
1k views

A colleague came to me with the following problem. She has a complex, multivariate data set, in which respondents completed a number of measures with anywhere from 6 to 30 Likert type items for each ...
Placidia's user avatar
  • 14.6k
2 votes
0 answers
307 views

This algorithm contain three techniques : 1-fuzzy c-mean clustering 2-Grey relational theory 3-Entropy multiple imputation The frame work of this algorithm is as follows : My questions are about the ...
zhyan's user avatar
  • 335
2 votes
0 answers
262 views

All, I have a question about what's practical when it comes to presenting results of multiply imputed data. I'm well-versed on the difference among MCAR/MAR/MNAR and approaches to imputing the data ...
steve's user avatar
  • 221