I'm currently involved in a project where I want to address missing data using multiple imputation. I'm using healthcare data in a longitudinal setting with 16 time points, where observations are nested twice: each observation represents therapy-time (i.e., each therapy ID is given 16 times, unless it was censored), and each therapy is nested within persons (i.e., a person can contribute multiple therapies to the data set). The analysis itself is planned out entirely, but I'm struggling a little with multiple imputation respecting the data structure and I'm unfortunately only somewhat familiar with MI theory. Briefly, there are ~10 variables that need imputation. Each of them I want to impute using fixed effects for $t$, a random intercept for my two ID-variables, and ~20 other variables. No imputation is needed for treatment or outcome (confounders only). However, when I use different packages in R to impute data:
- MI using MICE never worked. All my imputation models using the multilevel methods failed to converge, even when using a single predictor. I'm unsure why, but I've kind of given up on it for now, even though I would've preferred FCS MI.
- JOMO takes an eternity to run. I tried doing a little test run using a reasonable imputation model across my ~180000 observations, but after more than an hour I didn't even finish 100 burn-in iterations.
- Imputations using panImpute through the mitml-package is reasonably fast and I get models to converge. What I'm worried about is that pan imputes continuous variables drawing from a multivariate normal model, but most of my variables are heavily skewed (e.g., have floor effects that prevent me from transforming the variables appropriately) or are binary/categorical. I've seen some studies stating that imputation performance can still be reasonable under these conditions, but I'm a little afraid of just going with it and hoping that the imputation model ends up being good.
What I'd like to ask is:
- Has anyone else encountered the same problems when using MICE for multilevel imputations?
- Would you trust MI datasets imputed using pan when barely any of the imputed variables follow a normal distribution?
- Are there other packages I should take a look at? I know about Amelia, but I'm not sure if it has any advantages over pan.
Would appreciate any help regarding any of my questions! Thanks!