Some of the smooths in my generalized additive mixed model (GAMM) indicate in mgcv::k.check(m) they want to be more wiggly, but I don't think I have enough data to capture the full complexity of the trend. I know we're supposed to increase the basis dimension if k.check indicates it's too small, but I think a smoother pattern is fine since my main goal is prediction. Can I ignore the low p-values and keep the basis dimension, k, at the default? Also, is there an issue with "using up" all the available EDF to max-out k (i.e. does setting k=15, the max, come with any issues)?
I'm afraid of over-fitting the model. It seems from this post that leaving k at the default, in my case, is fine given my data size constraint, no? Since the EDF and basis dimension are still kind of close when set to the max, is this an indication that I have insufficient data for the interaction and shouldn't bother with increasing k? Should I model the terms separately as CYR.std + fSeason +... instead? Only the full dataset reproduces the effect and it's too large to show here.
# num = abundance, sal = salinity (psu),
# fSite = factor site (N=47),
# fSeason = factor season,
# CYR.std = "0" is first year, "1" is 2nd, etc..
> with(shrimp2, nlevels(interaction(CYR.std, fSite, drop = TRUE)))
[1] 705
> gratia::model_edf(m)
# A tibble: 1 × 2
model edf
<chr> <dbl>
1 m 64.1
m <- gam(num ~ s(sal) +
s(water_depth) +
fSeason +
s(CYR.std, by=fSeason) +
# All sites change in the same way over (smooth) time within a season, but
# each annual trend per season is different from the other.
# One smooth (amount of smoothing) per season
# Structural components I landed on...
s(fSite, bs = "re") +
# Each site have a different (average) abundance
# Within fSite variation
# Captures repeated measures effect
s(fSite, by = fSeason, bs="re") +
# Sites in different seasons have different abundance
# Captures between fSite and fSeason variance
# I expect all sites in wet season to have a different variance than dry season
# https://stats.stackexchange.com/questions/331692/random-effect-in-gam-mgcv-package
offset(log(area_sampled)),
method = "REML",
select = TRUE,
family = nb(link = "log"),
data = shrimp2)
> k.check(m)
k' edf k-index p-value
s(sal) 9 1.7952789 0.8448504 0.1125
s(water_depth) 9 0.6680327 0.8348178 0.0625
s(CYR.std):fSeasonDRY 9 5.8789026 0.7878817 0.0000
s(CYR.std):fSeasonWET 9 8.1699374 0.7878817 0.0000
s(fSite) 47 19.6991443 NA NA
s(fSite):fSeasonDRY 47 14.3528138 NA NA
s(fSite):fSeasonWET 47 11.4865389 NA NA
When k=16
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
A term has fewer unique covariate combinations than specified maximum degrees of freedom
Model summary and diagnostics:
> summary(m)
Family: Negative Binomial(1.215)
Link function: log
Formula:
num ~ s(sal) + s(water_depth) + fSeason + s(CYR.std, by = fSeason) +
s(fSite, bs = "re") + s(fSite, by = fSeason, bs = "re") +
offset(log(area_sampled))
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.74130 0.10465 -16.640 <2e-16 ***
fSeasonWET -0.03695 0.12892 -0.287 0.774
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(sal) 1.795 9 15.332 0.00449 **
s(water_depth) 0.668 9 2.896 0.08532 .
s(CYR.std):fSeasonDRY 5.879 9 114.048 < 2e-16 ***
s(CYR.std):fSeasonWET 8.170 9 78.652 < 2e-16 ***
s(fSite) 19.699 46 52.488 0.00661 **
s(fSite):fSeasonDRY 14.353 46 29.145 0.03465 *
s(fSite):fSeasonWET 11.486 46 20.312 0.07910 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.208 Deviance explained = 28.6%
-REML = 1439.7 Scale est. = 1 n = 1328
> mgcv::gam.vcomp(m)
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(sal)1 9.671446e-03 2.935140e-03 3.186794e-02
s(sal)2 2.616725e-03 3.817691e-41 1.793558e+35
s(water_depth)1 2.315768e-05 4.064052e-33 1.319565e+23
s(water_depth)2 8.536465e-02 1.056212e-02 6.899303e-01
s(CYR.std):fSeasonDRY1 3.964975e-01 1.990276e-01 7.898918e-01
s(CYR.std):fSeasonDRY2 1.877162e+00 3.453046e-01 1.020473e+01
s(CYR.std):fSeasonWET1 1.310183e+00 7.074553e-01 2.426415e+00
s(CYR.std):fSeasonWET2 4.328930e+00 9.640227e-01 1.943900e+01
s(fSite) 3.329785e-01 1.854022e-01 5.980226e-01
s(fSite):fSeasonDRY 3.479135e-01 1.673902e-01 7.231237e-01
s(fSite):fSeasonWET 3.005410e-01 1.161431e-01 7.777036e-01
Rank: 11/11
E3NB.sqr <- simulateResiduals(fittedModel = m, plot = TRUE)
draw(m, unconditional = TRUE, parametric = T)
m2 <- gam(num ~ s(sal) +
s(water_depth) +
fSeason +
s(CYR.std, by=fSeason, k=15) + ...
> k.check(m2)
k' edf k-index p-value
s(sal) 9 1.8027635 0.8480571 0.1325
s(water_depth) 9 0.2832327 0.8387580 0.0850
s(CYR.std):fSeasonDRY 14 6.4565782 0.8012377 0.0000
s(CYR.std):fSeasonWET 14 11.1928764 0.8012377 0.0000
s(fSite) 47 19.7453617 NA NA
s(fSite):fSeasonDRY 47 14.3656911 NA NA
s(fSite):fSeasonWET 47 11.9452456 NA NA
draw(m2, unconditional = TRUE, parametric = T)


