I’m fitting a binary logistic regression model that includes a continuous variable modeled using natural splines, and I’ve also included an interaction between that spline variable and another continuous covariate.
Conceptually, the model is something like:
glm(binary_outcome ~ ns(x1, df = 2) * x2 + other_covariates, family = "binomial", data= data)
Because ns(x1, df = 2) generates two spline basis terms, the model output includes:
- two coefficients (and two p-values) for x1, and
- two coefficients (and two p-values) for the x1 × x2 interaction.
Initially, I assumed that because the model uses two degrees of freedom (three knots), the two p-values corresponded to the effects between the first two and the last two knots. However, I’ve since realized this interpretation is probably incorrect, and I’m still unsure about their actual meaning and how they should be interpreted.
I’m unsure how to interpret these results:
- Do the individual p-values for the spline basis terms have any meaningful interpretation, or are they just mathematical components of a flexible curve?
- Similarly, what do the individual p-values for the interaction terms
mean — do they correspond to specific parts of the relationship
between
x1andx2, or are they not interpretable on their own? - Should these effects (both for the spline and for the interaction)
be evaluated jointly, for example with a Wald or likelihood-ratio
test across all spline-related terms? -> For example when I use
anova(model)I get 1 p-value instead of 2 - More generally, when modeling non-linear relationships with splines and interactions, can we meaningfully interpret regression coefficients or p-values at all, or does interpretation rely entirely on visualization of predicted probabilities or marginal effects?
- In the ANOVA table, spline terms and spline × continuous interactions each appear with a single p-value (despite representing multiple coefficients).
- I am not sure what is the point of ANOVA in this case. What exactly does this p-value represent?
- Does the ANOVA add value beyond joint hypothesis tests (like those from
linearHypothesis()or model comparison)?
I’m trying to understand the correct conceptual interpretation of such models — whether one should rely mainly on joint hypothesis tests and plots rather than on individual coefficients and their p-values.
I apologize if this question appears elementary. I am not a statistician or mathematician and I am seeking to better understand the fundamental concepts behind using and interpreting non-linear relationships. My interest is primarily practical, and I would sincerely appreciate a clear and accessible explanation from an applied perspective.
############### Model ###############
model <- glm(
binary_outcome ~ splines::ns(var1, df = 2) * var2 + other_covariates,
data = data,
family = "binomial"
)
############# Model results #############
summary(model)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0500 1.9700 0.53 0.59
ns(var1, df = 2)1 -9.9100 3.3800 -2.93 0.003 **
ns(var1, df = 2)2 -3.0000 1.9400 -1.55 0.12
var2 0.0070 0.0320 0.22 0.82
ns(var1, df = 2)1:var2 0.0900 0.0580 1.55 0.12
ns(var1, df = 2)2:var2 0.0500 0.0350 1.46 0.15
other_covariates ... (omitted)
############# ANOVA #############
anova(model)
Analysis of Deviance Table
Model: binomial, link: logit
Response: outcome
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 1848 2090
ns(var1, df = 2) 2 36.1 1846 2054 1.4e-08 ***
var2 1 109.2 1845 1945 < 2e-16 ***
other_covariates . . . . .
ns(var1, df = 2):var2 2 4.9 1828 1819 0.084 .
anova(model)a likelihood ratio test that is considered as joint hypothesis test in this case? $\endgroup$