2
$\begingroup$

I’m fitting a binary logistic regression model that includes a continuous variable modeled using natural splines, and I’ve also included an interaction between that spline variable and another continuous covariate.

Conceptually, the model is something like:

glm(binary_outcome ~ ns(x1, df = 2) * x2 + other_covariates, family = "binomial", data= data)

Because ns(x1, df = 2) generates two spline basis terms, the model output includes:

  • two coefficients (and two p-values) for x1, and
  • two coefficients (and two p-values) for the x1 × x2 interaction.

Initially, I assumed that because the model uses two degrees of freedom (three knots), the two p-values corresponded to the effects between the first two and the last two knots. However, I’ve since realized this interpretation is probably incorrect, and I’m still unsure about their actual meaning and how they should be interpreted.

I’m unsure how to interpret these results:

  1. Do the individual p-values for the spline basis terms have any meaningful interpretation, or are they just mathematical components of a flexible curve?
  2. Similarly, what do the individual p-values for the interaction terms mean — do they correspond to specific parts of the relationship between x1 and x2, or are they not interpretable on their own?
  3. Should these effects (both for the spline and for the interaction) be evaluated jointly, for example with a Wald or likelihood-ratio test across all spline-related terms? -> For example when I use anova(model) I get 1 p-value instead of 2
  4. More generally, when modeling non-linear relationships with splines and interactions, can we meaningfully interpret regression coefficients or p-values at all, or does interpretation rely entirely on visualization of predicted probabilities or marginal effects?
  5. In the ANOVA table, spline terms and spline × continuous interactions each appear with a single p-value (despite representing multiple coefficients).
    • I am not sure what is the point of ANOVA in this case. What exactly does this p-value represent?
    • Does the ANOVA add value beyond joint hypothesis tests (like those from linearHypothesis() or model comparison)?

I’m trying to understand the correct conceptual interpretation of such models — whether one should rely mainly on joint hypothesis tests and plots rather than on individual coefficients and their p-values.

I apologize if this question appears elementary. I am not a statistician or mathematician and I am seeking to better understand the fundamental concepts behind using and interpreting non-linear relationships. My interest is primarily practical, and I would sincerely appreciate a clear and accessible explanation from an applied perspective.

############### Model ###############
model <- glm(
  binary_outcome ~ splines::ns(var1, df = 2) * var2 + other_covariates,
  data = data,
  family = "binomial"
)


############# Model results #############
summary(model)

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)               1.0500     1.9700   0.53   0.59    
ns(var1, df = 2)1        -9.9100     3.3800  -2.93   0.003 ** 
ns(var1, df = 2)2        -3.0000     1.9400  -1.55   0.12    
var2                      0.0070     0.0320   0.22   0.82    
ns(var1, df = 2)1:var2    0.0900     0.0580   1.55   0.12    
ns(var1, df = 2)2:var2    0.0500     0.0350   1.46   0.15    
other_covariates ... (omitted)


############# ANOVA #############
anova(model)

Analysis of Deviance Table
Model: binomial, link: logit
Response: outcome
Terms added sequentially (first to last)

                          Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                             1848     2090              
ns(var1, df = 2)           2   36.1      1846     2054   1.4e-08 ***
var2                       1  109.2      1845     1945   < 2e-16 ***
other_covariates            .     .         .        .      .         
ns(var1, df = 2):var2      2    4.9      1828     1819   0.084 .
$\endgroup$
5
  • $\begingroup$ You might find that these questions have been addressed elsewhere: the general advice for spline basis functions is to not try to interpret the individual coefficients but instead the predictions arising from the whole equation. Likewise, when considering whether your var 1 has a substantive relationship with the outcome you would likely look at the joint hypothesis tests. $\endgroup$ Commented Nov 12 at 20:25
  • $\begingroup$ Here are a couple of good questions on top-level issues: stats.stackexchange.com/questions/652801/… and stats.stackexchange.com/questions/638916/… $\endgroup$ Commented Nov 12 at 20:26
  • 1
    $\begingroup$ Also Frank Harrell's Regression Modelling Strategies book will cover restricted cubic splines in detail (see also stats.stackexchange.com/questions/602838/… for differentiation from natural splines) $\endgroup$ Commented Nov 12 at 20:29
  • $\begingroup$ p.s. I think the specific questions you have could still all be answered in one place here $\endgroup$ Commented Nov 12 at 20:30
  • $\begingroup$ Isn't it thought anova(model) a likelihood ratio test that is considered as joint hypothesis test in this case? $\endgroup$ Commented Nov 13 at 11:50

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.