Unanswered 'multiple-imputation' Questions

9 votes

0 answers

238 views

Generalization of degrees of freedom for t distribution for coefficients after multiple imputation

Donald Rubin has shown that regression coefficient estimates have fatter tails after multiple imputation and has provided a formula for the degrees of freedom to use as a t-distribution approximation ...

Frank Harrell

105k

asked Jun 21, 2016 at 15:41

5 votes

0 answers

2k views

R plm cluster robust standard errors with multiple imputations

I am looking for a way to implement (country) clustered standard errors on a panel regression with individual fixed effects. That is, in plm() I want to define some ...

Andreas Chmielowski

51

asked Jul 15, 2020 at 17:20

5 votes

0 answers

688 views

Multicollinearity in structural equation modeling with multiple imputation?

Using R, I created a structural equation model and fit it to multiple datasets using the 'sem.mi()' function from the SemTools package. I know multicollinearity tends to be a concern for structural ...

poe

111

asked Apr 15, 2020 at 17:19

5 votes

0 answers

1k views

Descriptive statistics (frequencies, counts, proportions) after multiple imputation

I recently ran a multiple imputation using the mice package in R to generated imputed datasets. I have no problems with running inferential statistics on the pooled data (logistic and Cox regressions) ...

somesurgeon

81

asked Jan 15, 2019 at 22:07

5 votes

0 answers

470 views

How to do multiple imputation for spatial models?

I'm trying to estimate various spatial models such as spatial autoregressive regression (SAR), Spatial Durbin Model (SDM), and Spatial Error Model (SEM) but have missing data throughout my variables. ...

LJB

211

asked Dec 9, 2014 at 22:30

5 votes

0 answers

666 views

Permuting the formula argument to Hmisc:aregImpute

In Frank Harrell's RMS Short Course today, I became aware that multiple imputation with Hmisc:aregImpute is not invariant to the ordering of terms in its formula ...

David C. Norris

2,207

asked Mar 6, 2014 at 0:58

5 votes

0 answers

1k views

Multiple imputation of time variables -- which step to impute?

Lets assume I have a survival analysis study with an exposure, two covariates, and two time related variables. Say date of diagnosis and date of death. Combined, the two time related variables will be ...

Fomite

24.8k

asked Mar 19, 2012 at 21:17

4 votes

0 answers

522 views

Combining Gradient boosted trees after multiple imputation

Currently I am working with a gradient boosted tree model fit onto a multiple imputed dataset. For those who don't know multiple imputation: It predicts missing values and imputes that value with ...

SK4ndal

81

asked Jun 18, 2018 at 10:29

4 votes

0 answers

664 views

Multiple imputation of glm binomial size parameter

Suppose we have a generalized linear model with a binomial response $y_i\sim \mathrm{bin}(n_i,p_i)$ where $p_i$ is determined by the linear predictor in the usual way via some link function. Is there ...

Jarle Tufto

13.2k

asked May 23, 2017 at 14:01

4 votes

0 answers

3k views

perform quality check for imputed data with MICE in R

I'm currently working with the MICE algorithm to impute missing data. After I did the imputation I wanted to do some kind of quality check of the imputed data set. There are some suggestions here ...

ching

807

asked Aug 5, 2016 at 15:42

4 votes

0 answers

1k views

Checking Cox model assumptions with multiple imputation

I have run multiple imputation using MICE. I would now like to run a Cox model on it (using with,pool), and make sure that is justified. That is, I need to make sure that the proportional hazards ...

RayVelcoro

1,277

asked Aug 24, 2015 at 21:51

4 votes

0 answers

838 views

Imputation with mice: recode variables before or after imputation?

I am using mice in R, a chained equations (sequential regression) algorithm, to impute a series of polytomous variables (e.g. ...

tomka

7,004

asked Mar 21, 2014 at 10:49

3 votes

0 answers

92 views

How to pool estimates from multiply-imputed datasets with complex sampling designs?

Analysts often use Rubin's rule (RR) to obtain a pooled estimate of a popular quantity from multiple (imputed) datasets. While popular statistical software (such as the R ...

socialscientist

889

asked Apr 6, 2023 at 19:58

3 votes

0 answers

81 views

Mice package for imputation - chains not intermingling

I'm running an imputation using the mice package in R (imputing 7 variables with missing values on the basis of 10 total variables). The imputation runs fine, and ...

Henry Brice

171

asked Oct 17, 2022 at 14:43

3 votes

0 answers

113 views

Theory behind Multivariate Imputation with Chained Equations

Can anyone provide a reference to the theory that supports multivariate imputation with chained equations (MICE). I know Rubin has provided this for MI but MICE is a Gibbs sampler (I have never seen ...

Robert

31

asked Nov 22, 2020 at 21:16

3 votes

0 answers

539 views

How to use MICE in R to fill missing values in test set?

It seems that MICE does not have a "predict" function which allows to use a fitted mids object to predict the missing values in test data set. I can certainly ...

Catiger3331

145

asked Oct 18, 2018 at 20:36

3 votes

1 answer

268 views

Validity of tobit estimates after multiple imputation

I want to estimate tobit marginal effects using multiply imputed data, however I see that tobit is not among the estimation commands supported by Stata's MI prefix - I understand that the validity of ...

MartinQLD

565

asked May 23, 2018 at 23:43

3 votes

0 answers

2k views

How to use pooled results from multiple imputation?

I've been reading some posts about data imputation using multiple imputation, specifically the MICE R package. I get the main idea of creating multiple datasets with imputed data. The part that is not ...

paipaipai

31

asked Dec 22, 2017 at 8:14

3 votes

0 answers

922 views

General practice to impute missing values

There are multiple resources and answers on type of imputations and packages that can help in imputing the missing values or how to use a particular package. But there are little to no resources ...

Manraj Singh

31

asked Aug 2, 2017 at 6:42

3 votes

0 answers

120 views

Can someone give me an intuition of congeniality in multiple imputation?

As the title says. I read a lot about congeniality of Bayesian models (e.g. Meng, 1994) and I do know some definitions, but I don't feel I can get grip on what happens when models are congenial or ...

Suzanne

31

asked Feb 21, 2017 at 13:40

3 votes

0 answers

89 views

What statistical models / approaches can I use to estimate missing hourly values?

My dataset consists of hourly values by weekday across several sites, where the sites vary by spatial location and by other common characteristics, such as type, or 'cafe,' 'restaurant,' and 'bar.' ...

gallygator

31

asked Jan 29, 2016 at 23:44

3 votes

0 answers

47 views

Imputation of a (weird) multivariate outcome

I am working with a dataset in which the outcome of interest is a vector of dates of particular events: (date_1,date_2,date_3,...,date_n). Some of these outcome vectors are completely missing, but I ...

Plem

31

asked Jan 12, 2016 at 15:00

3 votes

0 answers

549 views

How to generate a longitudinal binary data with missing at random (MAR)?

I want to test the performance of a multiple imputation algorithm for longitudinal binary data. Right now I have applied the algorithm on some real data sets and it turned out promising and then I ...

David Z

1,628

asked Jun 11, 2015 at 17:42

3 votes

0 answers

539 views

Multiple Imputation and Matrix Completion

It is quite common that data sets will contain missing values in them. Suppose we want to try to fill in the missing values. For this we have techniques such as single/multiple imputation and matrix ...

GXR

31

asked Dec 27, 2014 at 3:16

3 votes

0 answers

740 views

How to compare and validate imputation models?

I've seen a lot of interesting questions here about multiple imputation and also great answers that helped me a lot to impute my data. I've used Predictive Mean Matching, EMB and I would like to use ...

psoares

606

asked Jun 18, 2014 at 7:55

2 votes

0 answers

53 views

Is a MICE approach to multivariable imputation well controlled for survival data?

When imputing data by an algorithm such as "mice", it occurs to me that the algorithm takes no account of the structure or representation of a survival outcome which is stored as an event ...

AdamO

67.5k

asked Jul 24 at 4:55

2 votes

0 answers

135 views

Variable selection in multiply imputed data

I have a dataset with approximately 1800 observations and I'm trying to fit a multivariable logistic regression model (250 cases, 1550 controls). There are 19 covariates (mix of continuous, ordinal ...

donm79

51

asked Dec 12, 2023 at 14:51

2 votes

0 answers

65 views

Multiple imputations generate values distributed differently from original dataset... does this mean my data is MNAR? Imputations still usable?

Quick question. I'm using the mice R package to impute missing data. I go by the presumption that the missing data are MAR, but I wouldn't be surprised if a few binary variables were MNAR. I followed ...

awastus

61

asked Nov 24, 2023 at 3:11

2 votes

0 answers

60 views

Theoretical Results for MICE Imputation

Is there any literature exploring convergence guarantees of the MICE imputation method for missing data? In practice, the method seems to work pretty reliably with different regressor but I can't seem ...

Doc Stories

21

asked Mar 23, 2023 at 19:41

2 votes

0 answers

42 views

Can be Rubin's pooling method (multiple imputation) be combined with Kenward-Roger or Satterthwaite degrees of freedom?

I would like to use multiple imputation algorithm with a Generalized Least Square with Kenward-Roger or Satterthwaite degrees of freedom. Does the commonly implemented Rubin's method account for those ...

Nikaraguien

41

asked Jan 22, 2023 at 1:26

2 votes

1 answer

567 views

What is the limit of missing values for multiple imputation in the mice package?

I have two questions about the mice package. The first, is the mincor in the quickpred argument. When on the cran it says it is the absolute minimum correlation compared. Does this mean that if I set ...

Kledson Lemes

175

asked Nov 10, 2022 at 3:32

2 votes

0 answers

191 views

How to implement Rubin's Rules to assess model fit on imputed test data with continuous outcome? (e.g. RMSE and 95% CI)

I'm working on a project now which involves the use of multiple imputation while developing machine learning models (using a training/test split, ~7000 observations total) for a continuous outcome. I ...

NB3

25

asked Apr 5, 2022 at 15:52

2 votes

1 answer

292 views

Partial eta squared calculation with multiple imputation data

I have 10 multiple imputation datasets ($N = 97$, two groups) and am running ANCOVA (controlling for pre-test values) to look at post-test group differences. Working in SPSS and can't really invest ...

Freddie

21

asked Nov 2, 2021 at 18:03

2 votes

0 answers

471 views

Combining random forest variable importance p-values from multiply imputed datasets

I am using the ranger package in R to construct random forests on 10 imputed datasets after implementing MICE to fill in missing values. The ranger package provides not only a variable importance ...

Geoffrey Kahn

21

asked Nov 6, 2020 at 4:01

2 votes

0 answers

44 views

Imputation that takes into account both relationships among variables and spatial adjacency?

I have a dataset with 13 variables and 50 observations representing the U.S. states. The variables represent the land use intensity of different agricultural industries in each state. Of those 650 ...

qdread

449

asked Oct 4, 2019 at 13:09

2 votes

0 answers

456 views

Interpolation versus imputation for time series on chemical profiles of water wells

So I am working with some data on water wells and time series of chemical pollutant tests on those wells. There are 10 chemicals and 10 years in the data. My goal is to do some clustering on the wells ...

krishnab

1,782

asked Jun 24, 2019 at 20:15

2 votes

0 answers

147 views

Obtaining measures of effect for contingency tables with multiply imputed data

The epi.2by2 function in the epiR package computes a chi-square test and provides measures of effect when count data are ...

C_H

125

asked Feb 1, 2019 at 23:22

2 votes

0 answers

183 views

How to test multiple regression assumptions when multiple imputation has been used?

I used multiple imputation on SPSS to deal with missing data in my study. I then carried out multiple regression from the imputed and original data-sets, using a split-file. I now have output for each ...

Charlie Hart

21

asked Nov 6, 2018 at 11:05

2 votes

0 answers

74 views

Choosing Among Multiply Imputed Datasets

I am using multiple imputation to estimate treatment effects in a dataset that contains missing data. In some of my imputed datasets, the algorithm used in the analysis fails to converge; it's not ...

Noah

40.2k

asked Aug 16, 2018 at 20:57

2 votes

0 answers

74 views

How to deal with undetectable outcome values? (data missing not at random)

I conducted a sound propagation experiment in which recorded maned wolves calls were broadcasted at different sites(x3), hours(x6: 17h,18h,23h,05h,06h,11h), and with different speaker position (x2: ...

Luane

21

asked May 4, 2018 at 20:24

2 votes

0 answers

143 views

To impute or not - community consensus for reporting accuracy of an imputed model

I have a model generated using an imputed data set with imputation accuracy of 75%. If the model using imputed data has an accuracy of 80% What would be the community consensus to report the ...

Khader Shameer

673

asked Feb 27, 2018 at 15:43

2 votes

0 answers

635 views

Compute the power of Tukey's Honest Significance Difference (or cognates)?

I've got a simple computational model I can run experiments with. Experiments are "free" but I don't want to run it more times than necessary because it still takes time. All the simulation use the ...

CarrKnight

1,108

asked Oct 26, 2017 at 8:15

2 votes

0 answers

814 views

Analyzing Multiply imputed datasets rich in categorical data

My original dataset with 48 subjects has a considerable number of missing items. Majority of the data is categorical (dichotomous) and some of it is ranked (ordinal). I performed a multiple imputation ...

Kelvin Mogesa

21

asked Feb 25, 2017 at 12:53

2 votes

0 answers

444 views

Multiple imputation of time-varying dataset for Cox model in R

I've seen others searching for similar issues, but have not yet come across a example that explains how to actually do this: I have a dataset with both time varying and non-time varying variables ...

Simen Buodd

43

asked Dec 30, 2016 at 20:07

2 votes

0 answers

120 views

Method for predicting price based on Geographical market, Product, and Company

I have a dataset which tracks the prices of 21 products, charged by 24 companies, in 150 different cities across the globe. However, the data set has missing values--that is, I might have Company X's ...

Sam

71

asked Aug 5, 2016 at 20:46

2 votes

0 answers

134 views

dividing a multiply imputed dataset into derivation and validation cohorts

R/statistics noob. Mac OSX 10.11, RStudio 0.99.842. I'm developing a clinical prediction tool as part of my PhD. I have missing data (23k cases, 24 variables, 70% of variables have at least one ...

mike

21

asked May 5, 2016 at 9:23

2 votes

0 answers

305 views

Unequal timepoints longitudinal data with missing values

I have a longitudinal data with unequal time points with missing values. I am looking for methods to impute the missing data. I looked at R packages NORM and AMELIA II and SAS procedures PROC MI. All ...

user24318

215

asked May 2, 2016 at 20:22

2 votes

0 answers

1k views

how to remove outliers prior to multiple imputation

A colleague came to me with the following problem. She has a complex, multivariate data set, in which respondents completed a number of measures with anywhere from 6 to 30 Likert type items for each ...

Placidia

14.6k

asked Feb 8, 2016 at 17:23

2 votes

0 answers

307 views

Using entropy to imputing missing value based on grey relational analysis and clustering

This algorithm contain three techniques : 1-fuzzy c-mean clustering 2-Grey relational theory 3-Entropy multiple imputation The frame work of this algorithm is as follows : My questions are about the ...

zhyan

335

asked Jan 9, 2016 at 20:29

2 votes

0 answers

262 views

Multiply imputing data, but using just one of the imputed data sets

All, I have a question about what's practical when it comes to presenting results of multiply imputed data. I'm well-versed on the difference among MCAR/MAR/MNAR and approaches to imputing the data ...

steve

221

asked Feb 23, 2015 at 1:07

Questions tagged [multiple-imputation]