Skip to main content

Questions tagged [multiple-imputation]

Use this tag for questions involving multiple imputation, which refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data.

Filter by
Sorted by
Tagged with
1 vote
1 answer
66 views

In the clinical trial design, we often make assumptions about clinical endpoints. For example, we may assume the blood pressure followed a normal distribution, such as $ X \sim N(\theta, \sigma^2)$, $...
wikichung's user avatar
2 votes
1 answer
386 views

I'm developing a scale for my dissertation project. I need help with how to handle missing data before I start the factor analysis procedure. Recently I ran a multiple imputation in SPSS (20 ...
stephan_phd's user avatar
3 votes
1 answer
99 views

In a paper from Atem, et al 2018 (DOI: 10.1002/bimj.201800275), they claim the following in section 3 regarding the so called "imputer/imputation model" - i.e. the model used to impute ...
AdamO's user avatar
  • 67.5k
1 vote
1 answer
304 views

I have two variables that I intend to use for creating prediction models for which I'm unsure how to handle missing values. The reason is that both are separated into multiple columns. All ...
floubert's user avatar
1 vote
0 answers
49 views

I have performed chained multiple imputation in Stata and performed 2 robustness checks to compare distribution of the original and imputed values Visual exploration Kolmogorov-Smirnov test of ...
Stephen Okiya's user avatar
1 vote
0 answers
218 views

I have multiple imputed data and will be conducting an identical lightGBM model with the same input features in each of the imputed datasets. My aim is to calculate SHAP values (SHapley Additive ...
Austin's user avatar
  • 11
1 vote
0 answers
78 views

For balanced longitudinal data, the missing data can be handled by multiple imputation in wide format where I assume MAR. However, for unbalanced longitudinal data(i.e. number of measurements varying ...
user45765's user avatar
  • 1,465
0 votes
0 answers
308 views

I used the multiple imputation function integrated in SPSS (method: auto [meaning Markov Chain Monte Carlo or in case of monotonicity SPSS reverts to Monotone]; 5 imputations) and now I'm running into ...
Ben's user avatar
  • 23
0 votes
0 answers
119 views

I have 70 imputations of my original data set. I want to choose 50 of them which converged after less than 120 iterations on my CFA model: ...
juliawwu's user avatar
0 votes
0 answers
466 views

I have a data set with baseline and t1 values in an RCT. Is a Jump to Reference (J2R) imputation possible here? The implementations I have found so far all require t2 values next to t1, which I do not ...
Gurkenkönig's user avatar
0 votes
0 answers
40 views

I have the value of production waste (m3/day) and also the consumption (kwh/day) of some locations. You can see in the output ...
Carlo Soares's user avatar
1 vote
1 answer
85 views

Hi I have used a set of data to estimate the relationship between v0 (response variable) and v1-v6 (independent variables) through a non-linear model using GAM. The model is fitted with data between ...
Elizabeth's user avatar
  • 271
3 votes
2 answers
831 views

I wonder how to draw survival curves (Kaplan-Meier) when there is no missing information on the survival variables but on the stratification covariate. For example, we know for all patients the follow-...
Flora Grappelli's user avatar
2 votes
1 answer
831 views

I'm struggling when trying to understand some aspects of multiple imputation when intending to do Cox regressions with my data. First of all, my dataset is not adapted for survival analysis yet. I ...
floubert's user avatar
3 votes
1 answer
414 views

I'm currently working on an event study to examine abnormal returns. In the first step, I've calculated abnormal returns in regards to a certain type of company event, consisting of roughly 13,000 ...
LeCV's user avatar
  • 31
2 votes
1 answer
659 views

My supervisor suggests that I impute the missing data in my dataset through various methods (namely complete cases, k-nearest neighbors, last observation carried forward and multiple imputation with ...
olke's user avatar
  • 115
0 votes
1 answer
2k views

Despite reading this other StatsExchange post, I am still struggling to understand what iterations do in multiple imputation, i.e. the parameter "maxit" in the mice() function. My ...
Ator's user avatar
  • 5
1 vote
1 answer
75 views

Research problem: Comparison of means (T-test, ANOVA) between 90 countries. Analytical problem: I have a large data set of over 120 000 observations. Each observation was measures by 8 variables: 2 ...
KamilB1988's user avatar
1 vote
1 answer
616 views

I am performing restricted cubic spline (Cox proportional hazard ratio) after imputing 10 datasets using mice package. My variables as follow: Outcome: DM Exposure: BMI time to events: time Covariates:...
Bkry's user avatar
  • 37
0 votes
0 answers
252 views

I am attempting to generate a predictive model for a continuous outcome using machine learning models. However, some observations in the original dataset have missing outcome data (and missing ...
NB3's user avatar
  • 25
1 vote
1 answer
1k views

I have run multiple imputations on my data and need to now export a final dataset that I can then calculate the mean of each imputed variable. How would one do this? I can't pool the data straight ...
Paige Cox's user avatar
2 votes
0 answers
191 views

I'm working on a project now which involves the use of multiple imputation while developing machine learning models (using a training/test split, ~7000 observations total) for a continuous outcome. I ...
NB3's user avatar
  • 25
3 votes
1 answer
99 views

I have a time-series dataset that has 120 missing rows due to consecutive network issues and I am trying to impute these values using MICE in Python. As the source of missingness is a total ...
Hanna's user avatar
  • 145
0 votes
0 answers
88 views

I know MICE can be used for imputation of multiple variables simultaneously. The expectation maximization approach (EM) can be used to impute missing data. Typically, one should only be using ...
StatsBio's user avatar
  • 113
1 vote
2 answers
534 views

Multiple Imputation (MI) for estimating desired a desired statistic but with missing data Following ^Shafer (page 4), and ^Austin et al. (section "Analyses in the M imputed data sets"), ...
travelingbones's user avatar
0 votes
1 answer
836 views

I have read and watched several tutorials about MICE. My confusion is about step 1: creating several copies of the original dataset and imputing different values in each copy. In some tutorials, I ...
Hanna's user avatar
  • 145
1 vote
0 answers
59 views

Context of problem: In some situations researchers face high-dimensional problems with $p > n$, where $p$ is the number of covariates to be considered in a regression model and $n$ is the sample ...
timm's user avatar
  • 327
0 votes
1 answer
1k views

I am working with a dataset containing ~300 predictors and ~3000 observations and building a predictive model using elastic net (and hoping to generalize to an external validation set). While the ...
NB3's user avatar
  • 25
1 vote
0 answers
926 views

I am trying to create new variables after multiple imputation. I have the following variables: data: mydata total number of observations=500 HDL: continuous (no missing values) Physical activity: ...
Bkry's user avatar
  • 37
1 vote
0 answers
60 views

I have a time series of binary outcomes (case yes/no) and several covariates. The goal is to estimate incidence and prevalence of the outcome. I would like to perform multiple imputation for both ...
user167591's user avatar
  • 1,173
1 vote
0 answers
480 views

How to calculate the standard deviation of the residuals for a pooled regression? (The idea, formula and/or R code are all welcomed) As far as I understand it is the standard deviation of the ...
Samuel Saari's user avatar
2 votes
1 answer
275 views

I am trying to run an interval regression using the survival r package (as described here https://stats.oarc.ucla.edu/r/dae/interval-regression/), but I am running into difficulties when trying to ...
Rachel's user avatar
  • 33
0 votes
0 answers
72 views

This is a general question for R users that are familar with interval censoring in survival analyses. I have clinical registry data at hand and aim to compare the one-year incidence for two groups (...
Fred's user avatar
  • 1
1 vote
1 answer
1k views

Most of methods of imputations requires either MAR or MCAR. How do we check the assumption on MAR or MCAR in general? In how to check missing data is missing at random or not?, Turgeon said $H_0:MCAR$ ...
user45765's user avatar
  • 1,465
1 vote
0 answers
80 views

I am analysizing data from a clinical trial. I used multiple imputation to impute the (binary) outcome variable, which is the only variable with missing data. All of the covariates are categorical and ...
Simone's user avatar
  • 11
0 votes
1 answer
753 views

Hello I would like to perform restricted cubic spline (Cox adjusted) after multiple imputation with mice.I use rms package. but after imputation when I use the function ...
Bkry's user avatar
  • 37
1 vote
0 answers
991 views

Update: It looks like I mispecified my prediction matrix. By setting the columns of covariates I did not want imputed to 0, the problem was solved. I had set other columns such as IDs to 0, but I was ...
user348629's user avatar
1 vote
1 answer
1k views

I am analyzing observational study data. My predictor variable is tg4 with 4 categories (0,1,2,3) and my response variable is dm ...
Bkry's user avatar
  • 37
1 vote
0 answers
303 views

I am currently using mice package to impute missing values for a simulation project. I use different performance metrics (eg. bias, precision etc) to compare different methods(eg. complete cases, ...
ChrisPol's user avatar
0 votes
0 answers
3k views

Okay, so please bear in mind I am very new to R and statistics in general! My professor has drilled into us that multiple imputation is the best way to deal with missing data (haven't been shown how ...
James L's user avatar
  • 61
1 vote
1 answer
2k views

Rubin's Rule for multiple imputation states that you are to construct a single interval after pooling into a single set of estimates and standard errors: $$ \bar{\theta} \pm t_{df,1-\frac{\alpha}{2}}*...
aiorr's user avatar
  • 147
0 votes
1 answer
278 views

I have a large dataset with a large amount of missing data, so I used the mice package to create multiple imputations to fill in the missing values. Now I am trying ...
Daniel's user avatar
  • 1
0 votes
1 answer
518 views

I am working on a multi-center project (3 centers; recruit ~20 participants in each center with a total of 60 participants). One of my independent variables (depression scale; continuous variable) was ...
R Beginner's user avatar
1 vote
1 answer
2k views

I am performing survival analysis (Kaplan-Meier, Cox-regression) for 1500 patients. I noticed that I have missing values for 15 patients in regards to their time-to-event and their dichotomous outcome-...
Fabian's user avatar
  • 51
2 votes
1 answer
292 views

I have 10 multiple imputation datasets ($N = 97$, two groups) and am running ANCOVA (controlling for pre-test values) to look at post-test group differences. Working in SPSS and can't really invest ...
Freddie's user avatar
  • 21
3 votes
1 answer
1k views

When using multiple imputation, what is the best way to run model diagnostics? In a related post here (Multiple Imputation and Regression Model Diagnostics), one option in the accepted answer was ...
Andy's user avatar
  • 45
1 vote
0 answers
244 views

I have a question regarding the homogeneity of variance in three regression models of diffrent datasets belonging to the same multiple imputed data. As I used multiple imputation I have to check ...
Ecidem's user avatar
  • 11
0 votes
1 answer
706 views

I have a dataset in which 45% of participants have missing data. Given the high proportion of missingness, I planned to conduct survival analyses on both an imputed and non-imputed dataset. I am ...
user333336's user avatar
0 votes
1 answer
208 views

I'm working with a very large amount of data, using the PMM method via R Mice. The data has a healthy number of continuous variables. I'm removing all the duplicates entries before starting the ...
Bobby O's user avatar
2 votes
2 answers
114 views

I am working with a dataset for which I have generated three hypotheses, i.e. I built three substantive models (two logistic regressions and one cox regression with a different dependent variable). ...
ptolemy28's user avatar

1 2
3
4 5
12