12
$\begingroup$

I am reading Multivariable Model Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables by Patrick Royston and Willie Sauerbrei. So far, I am impressed and it's an interesting approach I had not considered before.

But the authors do not deal with missing data. Indeed, on p. 17 they say that missing data "introduces many additional problems. Not considered here."

Does multiple imputation work with fractional polynomials>

FP is, in some ways (but not all) an alternative to splines. Is it easier to deal with missing data for spline regression?

$\endgroup$
6
  • $\begingroup$ Are you dealing with missing x's or missing y's or both? $\endgroup$ Commented Sep 1, 2017 at 23:49
  • 2
    $\begingroup$ +1 (!) I am really glad to see someone else ask a similar question. Recently I posted this question: stats.stackexchange.com/questions/295977/… about how to use restricted cubic splines in R's mice. I would specifically opt for splines as they do not require specifying a fractional polynomial while splines are flexible enough for a lot of functional forms. I do not know whether this answers your question though (hence this comment). $\endgroup$ Commented Sep 4, 2017 at 8:09
  • 2
    $\begingroup$ This is an interesting question, opening up (as one dimension of a possible answer) the possibility of effecting a criticism of these several smoothing/interpolation techniques by contrasting their ability to accommodate missing data. (To some extent, fragility to missingness is an 'embarrassment' to a modern method.) I note only in passing the obvious point that a Bayesian implementation would get you your imputation 'for free'. $\endgroup$ Commented Sep 9, 2017 at 12:04
  • 2
    $\begingroup$ @DavidC.Norris Your comment intriques me! Could you elaborate on how Bayesian methods accommodate missing 'for free' (which I assume you mean is handled by the methods of analysis appropriately, 'automatically' and as default)? (Or point me to a reference) $\endgroup$ Commented Sep 11, 2017 at 6:46
  • 2
    $\begingroup$ The no-free-lunch part of "free" here is that you must write down a Bayesian model, which implies thinking explicitly about the data generating process (DGP). Once you've done that, you treat the missing values as [nuisance] parameters. (In Bayesian, "everything is a parameter". See also latent variable.) Your MCMC then essentially exploits the DGP you've specified to 'impute' the missing values "for free" while it chugs along. $\endgroup$ Commented Sep 11, 2017 at 11:16

1 Answer 1

1
$\begingroup$

Multiple imputation can be used with fractional polynomials and splines. Let's say that $f(x)$ represents your functional form (e.g., $f(x) = x + x^.5$). Let $f_m()$ be the function estimated in each of $M$ synthetic samples, then your function is $\frac{1}{M}\sum^M_m f_m(x)$.

Assuming the software you are using for can provide a standard error estimate for every unique value of x, you can use Rubin's (Multiple imputation for nonresponse in surveys; 1987) formula for computing the standard errors. There are small and large sample formulas for the degrees of freedom with multiple imputation. The large sample formula (also in Rubin) just takes same inputs as the standard error, so can also be used. The small sample case takes the degrees of freedom of the model as an input; it is not obvious to me if this formula can be applied here.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.