Skip to main content

Questions tagged [multiple-imputation]

Use this tag for questions involving multiple imputation, which refers to a set of stochastic imputation routines aimed at preserving the multivariate features of the data.

Filter by
Sorted by
Tagged with
2 votes
0 answers
38 views

My training data is mostly missing values for the feature that I know will be the most important variable. This missingness is semi-random. For example, I know the value is missing for this feature ...
mdrishan's user avatar
  • 237
1 vote
0 answers
38 views

I've got a binary outcome, 10-20 predictors (some numeric, some binary). There is one focal predictor. I would like to present the effect as a marginal effect (i.e., an adjusted prevalence difference),...
WhatIsTuningInGLASSO's user avatar
1 vote
0 answers
34 views

I am working on a project to estimate the total fuel use for a fleet of delivery trucks that have each completed multiple trips. For many trips, the exact fuel usage is recorded, but for a significant ...
user1403856's user avatar
2 votes
0 answers
53 views

When imputing data by an algorithm such as "mice", it occurs to me that the algorithm takes no account of the structure or representation of a survival outcome which is stored as an event ...
AdamO's user avatar
  • 67.5k
0 votes
0 answers
55 views

I want to simulate data with missing values and use them to compare the predictive performance of several machine learning algorithms, including LASSO. All analyses will be performed in R, using the ...
Benykō-Zamurai's user avatar
2 votes
1 answer
124 views

Suppose there are two groups- a treatment and control. There are two covariates, say age and treatment. 100 participants per group. Age is observed for all 200 participants. However, the response ...
John L's user avatar
  • 2,786
1 vote
0 answers
51 views

I ran multiple imputation in R using mice. Only one categorical variable had missingness and I specified the imputation model to imputate it using ...
cheddar97's user avatar
5 votes
2 answers
321 views

I've used multiple imputation on a dataset and run a logistic regression model (using mice in R). This is my output ...
llewmills's user avatar
  • 2,345
3 votes
1 answer
180 views

I'm currently working with a dataset from a molecular epidemiology study involving an controls and cases for a cardiovascular event. The dataset includes several categorical health and lifestyle-...
Javier Hernando's user avatar
4 votes
1 answer
118 views

I'm using the mice and miceadds packages in R to perform multiple imputation and then analyze the results. Here's what I did: I performed multiple imputation on my dataset using the mice package. For ...
Danilo Calero Sequeira's user avatar
1 vote
0 answers
60 views

I am performing binary prediction on a dataset which contains missingness, and so I am leveraging Multiple Imputation (MI). For example, creating a train-test split, I perform MI on the training data ...
benedictjones's user avatar
0 votes
0 answers
83 views

I want to impute data in an RCT using the mice package in R and have some questions regarding the imputation of missing outcomes. Outcomes were assessed at 5 assessment points, T1-T5. Scale-level or ...
Sebastian's user avatar
  • 133
0 votes
1 answer
127 views

I am missing data on demographic variables such as age, gender, ethnicity. I have used stochastic regression to impute the missing data on all other variables of interest, such as psychological ...
Lee Zhiyuan's user avatar
8 votes
2 answers
734 views

I am in the situation where I have multiple variables, containing missing values, measured at time $t_0$, and some others measured at time $t_1$, which can be several years later. I need to impute ...
wrong_path's user avatar
0 votes
0 answers
78 views

I’ve been struggling with this question for a while, so any help is much appreciated! I’m trying to calculate an effect size (partial eta squared or $\eta^{2}_p$) for an ANCOVA model using pooled data ...
Andy's user avatar
  • 1
0 votes
4 answers
262 views

I am analyzing a dataset with variables such as Age, Sex, and Education, where some variables have missing values. One of the variables (Education) has over 60% missing data. For my analyses, I am ...
eshuns's user avatar
  • 15
2 votes
2 answers
273 views

I’m learning different approaches to impute a tabular dataset of mixed continuous and categorical variables, and with data assumed to be missing completely at random. I converted the categorical data ...
hiu's user avatar
  • 77
0 votes
0 answers
127 views

I am trying to estimate the prevalence of a binary variable "x" and its confidence interval after multiple imputations (using mice) and applying weights in R. I use Rubin's rules for the ...
Elodie L's user avatar
0 votes
0 answers
46 views

I'm interested in multiple imputation by joint modeling when all variables are incomplete. van Buuren describes the algorithm as follows: In the critical step 10, I am confused because for any ...
half-pass's user avatar
  • 3,850
1 vote
0 answers
38 views

I have health records of immunodepressed patients who may have event histories like [high risk demographics] -> [low lymfocyte count] -> [high viral load] -> [clinical events] From those data ...
Helene Hoegsbro Thygesen's user avatar
0 votes
0 answers
69 views

I have very big dataset of around 3 million rows and 50 variables of different types. The dataset is longitudinal in long format (around 350 000 unique individuals). I want to impute missing data ...
Tasosmav's user avatar
3 votes
1 answer
176 views

I'm conducting a study measuring happiness across 4 time points, aiming to determine if there's an increase in overall happiness. The required sample size is 24 for four time points and 28 for three. ...
anna eyre's user avatar
  • 141
1 vote
0 answers
52 views

What would be the best approach to deal with missing data in a dataset when we want to run a PCA and then use the participant component scores extracted from the PCA as predictors in a mediation model?...
CatM's user avatar
  • 526
2 votes
1 answer
615 views

I have very big dataset (around 10 million rows) with repeated measures of around 500 000 individuals, irregularly spaced through time. My final goal is to do IPTW and fit a weighted cox regression ...
Tasosmav's user avatar
3 votes
1 answer
101 views

I have a linear mixed model, which uses a multiply imputed dataset. I saw that LRT could be used to assess Fixed effect significance in linear mixed model. I used ...
Alexandra Chapdelaine's user avatar
12 votes
5 answers
2k views

A question is how many missing values are too many to be handled. It has been asked in the context of applying specific software and method (MICE). I am interested in understanding a bit better what ...
Johan's user avatar
  • 346
0 votes
1 answer
150 views

Study goal: estimate the proportion of patients who experience outcome Y (1=Yes, 0=No) within max 5 years of follow-up. Missing data issue: Outcome Y is missing for a large proportion of people (96% ...
mmaliniak's user avatar
1 vote
2 answers
83 views

I would like to fit a statistical model where the dependent (response) variable is a validated scale score from a questionnaire. For each subject, this dependent variable is calculated from the values ...
user167591's user avatar
  • 1,173
0 votes
0 answers
133 views

I want to run a survival analysis (say, Cox model) with time of origin at birth and a disease (say, cancer) as the event,. Covariates are around 5 demographic variables (age, sex, etc.). The problem ...
processing_statistician's user avatar
1 vote
1 answer
239 views

I'm using MICE to impute a small data set. I am going to use ANCOVA of type II through Anova function of R package car. However, ...
wdg's user avatar
  • 335
4 votes
2 answers
677 views

I'm wondering if there is any established method for assessing model fit in logistic regression conducted with multiple imputed datasets. To the best of my knowledge, there are two primary approaches ...
JuBe96's user avatar
  • 43
1 vote
1 answer
163 views

How shall I impute the data in the following situation: I have some baseline covariates collected and longitudinal data. Both baseline covariates and longitudinal data have some missing data. Shall I ...
Kate's user avatar
  • 347
2 votes
2 answers
111 views

I have a longitudinal data set with 2 dependent variables (couple) - a husband and a wife. There were 2 waves for the husbands and 3 waves for the wives. Since there is a lot of missing data, I ...
eagersquirrel's user avatar
3 votes
1 answer
238 views

I have a data.frame named mydata with 6 columns: status, times, t1, t2, t3, t4. However, t1, t2, t3, and t4 contain missing values in this dataset. I intend to impute these missing values using the ...
dbcoffee's user avatar
  • 219
1 vote
0 answers
58 views

I have spent an extensive amount of time trying to understand the possible role of MICE in helping to "fill in" missing outcome data. I am relatively new to both multiple imputation and ...
R Har's user avatar
  • 11
3 votes
1 answer
220 views

Following Rubin's rules for multiple imputation, I've calculated pooled estimates, group means in this case, with pooled standard errors. I checked this with a bootstrap and, assuming pooled standard ...
jay.sf's user avatar
  • 1,049
2 votes
1 answer
115 views

We decided to use the multiple imputation method in a RCT to solve the problem of some follow-up missing data (for completely random reasons). I was planning on using the Multiple Imputation method ...
Mai's user avatar
  • 21
1 vote
0 answers
64 views

Some background: I imputed and weighted data from two groups of people, one group in a certain organization and one outside of it. Ultimately, I want to compare how they develop psychological trait X ...
MHx01's user avatar
  • 33
2 votes
2 answers
433 views

Imagine an RCT with a time-to-event outcome which is analyzed using a Cox regression. There are four assessments (T1=before randomization, T2=3 weeks, T3=6 weeks, T4=12 weeks). Under the censoring at ...
Survival's user avatar
  • 149
2 votes
1 answer
87 views

Whether it is a method for dealing with monotonic or arbitrary missing data (FCS or MICE), there is a process I do not understand. Let's take the example of linear regression for continuous variables: ...
Guillaume's user avatar
0 votes
0 answers
100 views

I'm attempting to analyse a longitudinal, retrospective dataset with measurements at various time-points. The data-set has a significant amount of missing data, up to 30% for the main outcome variable,...
R.A. Been's user avatar
0 votes
0 answers
84 views

I have a complex survey dataset with a response (dependent variable) bounded between 0 and 1, where I have applied multiple imputation to the dataset to account for missing data. The response formally ...
user45765's user avatar
  • 1,465
7 votes
3 answers
2k views

I am researching predictors of dropout from a training program. I want so to see if personality traits add incremental variance above well-established predictors like age, fitness, and education. So, ...
E_H's user avatar
  • 351
2 votes
2 answers
315 views

I've got 2 nested Cox models, which I fit to 10 imputed datasets. Pooling the regression coefficient estimates and associated p-values I've done already. I'm trying to work out if adding one extra ...
Isaac Allen's user avatar
2 votes
1 answer
132 views

I need your help with my problem. So, after step of imputation missing data through MICE method, I got multiple imputed dataset. Then, I pooled the estimates and coefficients with mixed effect cox ...
Hoang-Giang Pham's user avatar
6 votes
4 answers
1k views

I posted this question a few days ago on datascience.SE because I thought it was more relevant there: Why is multiple imputation not used more widely in Data Science? I have a background in ...
Joe King's user avatar
  • 4,192
1 vote
0 answers
72 views

Note: The question has been edited to make it more focused, and the title has been changed to make it clearer. I have read questions/answers about how to select variables for imputation. This question ...
Verity's user avatar
  • 11
2 votes
1 answer
298 views

I want to use the R package MICE for Multiple Imputation and I have a question concerning the order of my dataset - regarding the order of my variables on the one hand and the order of my cases on the ...
rNewbie's user avatar
  • 23
1 vote
0 answers
174 views

I have a relatively large data set with around 12000 samples with 550 variables. Originally, I have around 800 variables, I used a rule that if missing rate in each variable is larger than 80% I will ...
Steven Xu's user avatar
2 votes
2 answers
1k views

I'm working on a project that is using some more advanced statistical methods and coding than I'm normally used to and would appreciate some help. The project required me to do multiple imputation, ...
smirza's user avatar
  • 31

1
2 3 4 5
12