Skip to main content

Questions tagged [multiple-imputation]

Use this tag for questions involving multiple imputation, which refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data.

Filter by
Sorted by
Tagged with
1 vote
1 answer
1k views

I have been doing multiple imputation using MICE in R. When running, some variables in the console appeared to have asterisks next to their name, e.g., x1* x2* x3*. These asterisks started to appear ...
Amanda Lemon's user avatar
1 vote
1 answer
215 views

I'm planning to undertake Network Modelling. However, I've been told that multiple imputation is a problem for Network Modelling. I have a lot of missing data. Any suggestions? Thanks!
Sofia's user avatar
  • 11
1 vote
0 answers
94 views

When testing multiple imputation algorithms in simulations, the bias of the examined estimates and the 95% coverage rate are often used as a quality metric. I understand that it is generally ...
joacim022's user avatar
2 votes
1 answer
809 views

I have a dataset consisting of mobility data at the county-level for the US for about one year. So the number of observations is >1m. Apart from the county code, the date, and the mobility index, ...
Tea Tree's user avatar
  • 280
2 votes
1 answer
112 views

I am developing a Survey Analysis Application that works with Categorical Data and looking for a way to treat missing data fairly without losing the original data variance. Thus, I came up with a way ...
My Name's user avatar
  • 131
0 votes
1 answer
853 views

I am trying to do multiple imputation on a data set like below. This is a dummy data set. There are no missing values for variables idnum and eye. But specific idnum should have same bmi and gender ...
ACHD's user avatar
  • 13
3 votes
1 answer
762 views

I'm wondering whether the pool() function from R's mice package correctly pools shared frailty models fitted on multiply imputed ...
schotti's user avatar
  • 600
4 votes
1 answer
420 views

I'm doing some imputation with the MICE package. The outcome variables I am using are zero-inflated, and in the absence of imputation, I would analyze them with a zero-inflated negative binomial ...
cjfcjf's user avatar
  • 439
4 votes
1 answer
2k views

I've understood that the kNN imputer, being a multivariate imputer, is "better" than univariate approaches like SimpleImputer in the sense that it takes multiple variables into account, ...
LeLuc's user avatar
  • 711
1 vote
0 answers
109 views

I have a problem in which I'm looking to train an individual-level predictor for outcomes. I have information on individual-level covariates, but I don't have individual-level outcome variables. ...
cowabunga's user avatar
3 votes
1 answer
2k views

I am new to multiple imputation, so apologies if the question is naive. I have been exploring multiple imputation for use on a small data set where there are just a couple of data points. I would like ...
Null's user avatar
  • 31
0 votes
0 answers
60 views

I am very new to SPSS and the Statistics package. I have bi-weekly data on precipitation, air temperature, and several well water temperature (n~75). However, due to various reasons I have about 60-70%...
JackWassik's user avatar
2 votes
1 answer
445 views

I have a question about using multiple imputation. I have a dataset of approximately 16,000 people with over 15 years of follow-up. To address missing data and loss-to-follow-up, I'm using multiple ...
PotterFan's user avatar
0 votes
0 answers
97 views

I'm currently working with a data set from the medical domain, in which several parameters contain missing values (up to 40% of the data). The missing data is mostly caused by some hospitals not ...
sander's user avatar
  • 153
0 votes
1 answer
105 views

Are there any statistical procedures to pool standardized differences across multiple imputations after matching in R? I am aware of pooled estimates using mira but I just want the pooled standardized ...
Ese's user avatar
  • 15
0 votes
1 answer
509 views

I am working with 10 multiple imputed datasets in SAS. I used the command PROC SURVEYLOGISTIC to fit a multivariate model with 6 predictors (3 dichotomous and 3 categorical) and their interaction ...
Ahinoa 's user avatar
2 votes
1 answer
341 views

I am trying to use MI to deal with missing values in my data set. If I understand correctly, MI is about simulating multiple data sets from a given initial data set and imputing possible values ...
ezcollins's user avatar
5 votes
1 answer
2k views

I'm working on a project with medical data where some of it is missing. We decided to impute the data using MICE and I found enough literature about how to choose $m$ (the number of imputations) and $...
Hauptideal's user avatar
9 votes
4 answers
2k views

I can derive an MCMC algorithm for sampling from the posterior distribution of a parameter vector of interest, but only starting with a dataset that has no missing values. The actual dataset that I ...
frelk's user avatar
  • 1,507
5 votes
3 answers
5k views

I have a longitudinal database that has more than 50% of the missing data of the MAR type. This amount of missing values was a surprise to me because I did not foresee this in the study design, and ...
Kledson Lemes's user avatar
0 votes
1 answer
229 views

I have data with n = 80 and 10 predictors, and ran MI using MICE, followed by Variable Selection for Multiply Imputed Data using ‘miselect’ and finally have 4 non-zeroed coefficients. Since ...
User1121's user avatar
0 votes
1 answer
443 views

We are interested in using a substantive model compatible fully conditional specification (smcfcs) for multiple imputations of missing data. I believe this approach will lead to more unbiased ...
MaxIRADS's user avatar
1 vote
1 answer
1k views

I have a dataset that has 6 metabolites that were measured over time in two groups and using a mixed-effects linear model I would like to investigate the group differences for each metabolite. Since ...
CST's user avatar
  • 349
1 vote
1 answer
451 views

I have a dataframe with 6 columns, X1_t1, X2_t1, X3_t1, X1_t2, X2_t2, X3_t2, each of them have some missing values, which one of the following options is recommended if I want to examine the change of ...
Ashley's user avatar
  • 11
1 vote
1 answer
468 views

After I imputed two datasets, the two datasets are synthesized via statistical matching. With the synthesized data, I tried to pool estimates but got errors as below. ...
pltr's user avatar
  • 11
3 votes
2 answers
1k views

I'm using {caret} to impute missing data resulting from non-response to survey questions. All of these variables are defined as numeric, though most are dummies. ...
Dr. Fabian Habersack's user avatar
0 votes
0 answers
415 views

I am using Amelia for multiple imputation, and have a very large list of variables to predict missingness in my exposure and outcome. However, I don't want to impute values in these predictors, I only ...
JRB's user avatar
  • 501
0 votes
0 answers
106 views

I've imputed variables to conduct a multiple regression analysis in SPSS but unfortunately SPSS doesn't provide certain pooled values, like R², standardized betas, etc. It looks like I have to conduct ...
r_newbie's user avatar
3 votes
1 answer
1k views

I have a longitudinal data set with missing values. I want to multiply impute (let us say $m$ = 20 times) the missing values in the wide format using the R-package mice. Thereafter, I would like to ...
Benykō-Zamurai's user avatar
0 votes
0 answers
67 views

The mice package says, "Many diagnostic plots are implemented to inspect the quality of the imputations." What are those plots? Which are useful? How do you use them?
dfrankow's user avatar
  • 3,506
0 votes
1 answer
171 views

I am using SPSS and I'm currently working with a datafile that contains multiple imputation (40 imputations) because of missing data. My question: Should I keep the original dataset (with missing ...
StateOn's user avatar
3 votes
0 answers
113 views

Can anyone provide a reference to the theory that supports multivariate imputation with chained equations (MICE). I know Rubin has provided this for MI but MICE is a Gibbs sampler (I have never seen ...
Robert's user avatar
  • 31
2 votes
0 answers
471 views

I am using the ranger package in R to construct random forests on 10 imputed datasets after implementing MICE to fill in missing values. The ranger package provides not only a variable importance ...
Geoffrey Kahn's user avatar
4 votes
1 answer
710 views

I am using the MICE package in R to impute my dataset to deal with missing values. I have missing values in both the response and the predictor variables. I am interested in following Paul T. Von ...
Amanda Goldberg's user avatar
2 votes
1 answer
2k views

Based on this article, it is apparent that MICE works with the following logic: Fill missing values in every column apart from the column in question with either random or the mean of the given ...
Ryan Brown's user avatar
1 vote
1 answer
971 views

I have conducted a multiple imputation in R with 5 imputations and 50 iterations using the function mice() from the corresponding mice package. Now that I have ...
user301312's user avatar
0 votes
0 answers
50 views

I am working with pregnancy data where I would like to impute a variable called LABOR PRESENTATION (nomical var. with 5 categories) from several other variables but then create a variable called ...
Fouzia Farooq's user avatar
4 votes
1 answer
675 views

I would like to apply the margins function to imputed data (I used mice), but it seems not possible. Do you know if a function exists that calculates marginal effects with imputed data? Thank you! ...
Ele_456's user avatar
  • 41
0 votes
1 answer
213 views

I am new to the topic of SEM, so my question may seem a bit naive. I have about 60 observed variables that will be grouped in some latent variables to explain one outcome measure. One of the variables ...
Aldebaran's user avatar
1 vote
1 answer
759 views

I am reading the an online book by Stef Van Buuren (link at bottom) regarding multiple imputation. In Section 3.2.1 he lists 4 different approaches to multiple imputation: Later on in Section 3.3 he ...
Sean's user avatar
  • 654
1 vote
1 answer
46 views

I have two sets of data provided by the government - one spans the years 2016-2020, while the other only covers 2018-2020. Data from each dataset, for each year, are used to predict some outcome in ...
boot-scootin's user avatar
0 votes
1 answer
617 views

When a data set contains a fraction of missing values - Which strategy should be chosen: first impute data before providing population discriptives, or give the population insights and cross-tabs ...
aman rastogi's user avatar
0 votes
0 answers
554 views

I have an imputated data with several nonmissing and not-imputated variables. However, I realised when I use mivalext_lr() to obtain pooled AUC and 95% CI of my ...
YY Shi's user avatar
  • 1
0 votes
1 answer
87 views

I have 2 questions about Multiple imputation (MI) in the assessment of the prognostic performance of a test. This test acts as a predictor of a specific outcome, 3 years in the future. I have 26 % of ...
Gil77's user avatar
  • 1
0 votes
0 answers
47 views

I have data in which the number of missing values per cluster (in this case, zip-code), are proportional to the population. Does this indicate Missing at Random (MAR)? Third column with missing ...
user291195's user avatar
5 votes
0 answers
2k views

I am looking for a way to implement (country) clustered standard errors on a panel regression with individual fixed effects. That is, in plm() I want to define some ...
Andreas Chmielowski's user avatar
2 votes
1 answer
338 views

I'm trying to calculate population mean, median, (etc, descriptive analysis) using multiple imputed data. However, the example that I found in sources were regression and then pool them into one ...
user291195's user avatar
2 votes
2 answers
615 views

I am performing data imputation on a large matrix [100000,34] of past measurements that contains missing values (rows are time-steps and columns are stations). So far I've used several machine-...
iditbela's user avatar
  • 147
0 votes
0 answers
5k views

I read few responses close to the question and was suggested in using t-test or chi-sq test. However, the pattern between variables can also involve more than 2 variables (e.g. data at x tend to be ...
user291195's user avatar
10 votes
1 answer
11k views

I have multiple set of imputations generated from multiple instances of random forest (such that the predictors are all the variables except the one column to impute). I was referred to Rubin's rule ...
daddymaterial's user avatar

1 2 3
4
5
12