I’ve been running Generalized Additive Models (GAMs) to explore temporal trends in my soil phosphorus data. I have 20 years of data at each site. I'm considering either modeling individual GAMs for each site or a hierarchal GAM with global smooth and site specific deviations from the global smooth. My workflow compares a “plain” GAM with a GAMM that includes an autocorrelation structure. Workflow:
- Fit a GAM with
gam()and check assumptions withgam.check(). - Fit a GAMM with autocorrelation using
gamm(..., correlation = corCAR1(...)). - Check the $\phi$ parameter (the AR(1) correlation estimate) and its confidence interval.
- If $\phi$ indicates significant residual autocorrelation, I keep and report the autocorrelation model. If not, I stick with the simpler GAM.
I want to ensure I am modeling my data—and replicates—correctly.
Code:
Simple GAM
m1 <- gam(Total_P ~ s(Year, k = 3),
data = filter(soil_df, Site == "S1"),
method = "REML")
gam.check(m1)
GAMM with CAR(1) autocorrelation
mod1 <- gamm(Total_P ~ s(Year, k = 3),
data = filter(soil_df, Site == "S1"),
correlation = corCAR1(form = ~ Year|Plot),
method = "REML")
#Compare models
summary(mod1$gam)
AIC(m1, mod1)
#Estimate of autocorrelation parameter and CI
smallPhi <- intervals(mod1$lme, which = "var-cov")$corStruct
smallPhi
Hierarchal gams:
m_tp_GI <- gam(
Total_P ~
s(Year, k = 3) + # global smooth
s(Year, Site, k = 10, bs = "sz"),
data = soil_df,
method = "REML"
)
m2_ac <- gamm(
Total_P ~
s(Year, k = 6, m=2) +
s(Year, Site, bs = "fs", k = 6, m=2),
data = soil_df,
correlation = corCAR1(form = ~ Year | Plot),
method = "REML"
)
Questions
In my dataset, I have three replicate soil collections per year. These were randomly sampled, not permanently marked plots. That means they are not repeated measures through time.
To make the correlation structure work in gamm(), I added a Plot column to uniquely identify each replicate. Since corCAR1(form = ~ Year|Plot) expects an ID for the grouping factor, I believe this setup treats residuals as correlated over time within each replicate.
Should I instead collapse replicates to yearly means and model autocorrelation across years? Or can I keep replicates modeled this way?
Do hierarchical models with penalization give the same answer as site-by-site GAMs?
Gam.check k selection and gam.check diagnostic plots of the hierarchal GAM don't look great right now; is my model structure missing anything?
UPDATE I fit a HGAM with tweedie distribution, which seemed to fit data best
m_TP_SRS <- gam(
TP ~ s(Year, k = 8) + # global trend (shrinkage)
s(Year, Site, bs = "sz", k = 8), # site-specific deviations
data = srs,
family = Gamma(link = "log"),
method = "REML")
m_TP_SRS_tw <- gam(
TP ~ s(Year, k = 8) + # global smooth
s(Year, Site, bs = "sz", k = 8), # constrained factor smooth (includes site mean shifts)
data = srs,
family = tw(link = "log"),
method = "REML"
)
AIC(m_TP_SRS_tw,m_TP_SRS)
