I have a dataset with approximately 1800 observations and I'm trying to fit a multivariable logistic regression model (250 cases, 1550 controls). There are 19 covariates (mix of continuous, ordinal and categorical) with P < 0.2 on univariate regression and I am planning to include them in an initial full model. There are low-moderate missing data (1-10%) for the majority of covariates and high missing data for one covariate (70%), so I have created 70 multiply imputed datasets using the mice package in R.
This is my first time modelling multiply imputed data. I have previously used purposeful selection of covariates as described by Hosmer and Lemeshow but I am not sure how to do this in multiply imputed data as I don't think it is possible to compare fits using partial likelihood ratio tests. Would it be reasonable to use fit.mult.impute at each stage of the purposeful selection process (which I understand fits the model in each imputed dataset and then combines coefficients using Rubin's rule)? Is there an optimal way to assess and compare each model fit against the last?
Are there other selection procedures that are likely to be simpler or produce better results for my data? I have seen a package called "miselect: variable selection for multiply imputed data", which provides procedures for LASSO and elastic net regression in multiply imputed data. Is this worth exploring?
Many thanks for any suggestions.