I would really appreciate some advice.
I have two sequential measurements of severity of illness $(s_1, s_2)$ that are incompletely observed along with an outcome measure ($y$). I am trying to estimate if the change in the measurement $(d = s_2 - s_1)$ adds anything to a model based on the second measurement $s_2$ alone.
There is an extra complication in that the severity of illness measurement is a score derived by categorising 10 continuous measurements of physiology ($x_1,x_2...x_{10}$), and weighting them. This is helpful because it is the standard clinical method in which to report severity, and it handles variables that are not always linearly associated with survival (e.g. very high or very low blood pressure is a bad thing --- you're best off in the middle).
I have been trying to use multiple imputation assuming MAR in Stata 12.
The final model would therefore be $y=\alpha + \beta_1s_2+\beta_2d+e$.
However, I think there are several different imputation strategies and I am not sure which is correct.
- I use the raw physiology in the imputation model, and then derive the weights afterwards.
- I derive the individual weights ($x_1,x_2...x_{10}$) first, and then use these in the imputation model.
- I use the aggregate score in the imputation model.
I am aware that using a transformation of a variable in the final model after the imputation (e.g. $x^2$ when you only included $x$ in the imputation) biases the effect of $x^2$ to zero. I am not sure if this is also the case with a categorisation and summation? In which case, I assume (2) would be the best option. Otherwise (1) seems best to me. I have already discounted (3) because it seems to lose a great deal of the richness of the data.
Thanks in advance.