This seems like a perfect opportunity to use posterior simulation.
The general idea would be to:
- create a data set of covariate values you want to predict at, including each week, for all combinations of the organisms and treatment levels you want to consider
- simulate a bunch of new data from the model, say 10000 draws for each row of the data set created in 1.
- for each draw
- for each each organism
- for each treatment level
- form the weekly cumulative sum of the model predicted data
- summarise the posterior distributions you created in 3
- For each combination of organism and treatment level, compute say the median and 0.25th and 97.5th probability quantiles of the posterior distribution to give an estimated cumulative sum and it's uncertainty.
This might sound complicated, but you can use, for each step above
gratia::data_slice() to generate the new data to predict at (1. above)
gratia::predicted_samples() or gratia::posterior_samples() to do the posterior simulation in 2. The difference is whether you want to only consider the uncertainty in the model parameters — i.e., only the uncertainty in the expected response, $\mu_i$ — (gratia::predicted_samples()) or both the uncertainty in the model parameters and the sampling uncertainty in the data (gratia::posterior_samples())
- your dplyr fu will handle this step of computing the cumulative weekly summed counts
group_by(.draw, organism_id, treatment) |> summarise(cum_count = cumsum(.fitted)) for example
- the ggdist package has several functions for computing the posterior distributions
I have more details on the different things we might sample from in this article that accompanies my gratia R package: https://gavinsimpson.github.io/gratia/articles/posterior-simulation.html
We used this approach in a paper on muscid fly phenology (Gerlich et al 2025) that was published just a couple of weeks ago (at the time of writing), to do something very similar to what you want to do here.
You don't need to use a GAMM (gamm()) to model the autocorrelation if the $f(\mathtt{week}_i)$ term in the model (s(week)) sufficiently captures the temporal dependencies. I can almost guarantee that it won't in your model because you're assuming that all organisms follow the same time trajectory. But, there's nothing stopping you including in your model a smooth term for week for each organism, for example:
gam(weekly_eggs ~ s(week) + # common smooth of week
s(week, organism_id, bs = "fs") + # organism-specific smooth of week
s(treatment, pc = 0), # independent treatment effects
method = "REML",
data = data_egg_gam,
family = nb()
)
You could also allow the treatment effect to vary with week if needed. This model is an example of modelling the temporal dependence through the mean structure of the model, which is an alternative to the approach of modelling the dependencies through the covariance matrix. See Hefley et al (2017) for more on these ideas.
Sørine (Gerlich) and I tried to document the mgcv code for the various GAMs we estimated in the paper I mentioned — I think this is all in the papers Supplementary materials. You can fit the organism-specific effects in different ways, for example, most of which I covered in an earlier paper (Pedersen et al, 2019) on what we called a HGAM.
Regarding your sub-questions:
(1), I presume I would need to use a GAMM to account for the obvious autocorrelation
Modelling the dependencies through the model's covariance matrix does get trickier with non-Gaussian models; gamm() uses the trick of glmmPQL() to essentially view the GAMM as a GLMM, which in turn estimates the model as a LMM with lme(), iterating back-and-forth between these different representations. For low mean count Poisson data, this PQL method for GLMMs is known to not work so well (e.g. Bolker et al 2009). If your data are regularly-spaced in time, you can fit an AR(1) in the covariance matrix of the model using bam(), which, when combined with a non-Gaussian fmaily, treat the AR(1) correlation matrix as a working correlation matrix in the same way as one would in a generalized estimating equation (GEE) model.
(2) not sure if the negative binomial error distribution would work for cumulative count data...
This is the wrong way to think about it; your actual data is not the cumulative count and you should try to stick as close to the original data as possible when modelling (in this case). Better, in this case at least, is to do the cumulative summation after you've fitted a model to the observed count data, and evaluate the uncertainty in those cumulative sums, either via math (which is hard, at least I find it hard) or via posterior sampling.
References
Bolker, B.M., Brooks, M.E., Clark, C.J., Geange, S.W., Poulsen, J.R., Stevens, M.H.H., White, J.-S.S., 2009. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol. Evol. 24, 127–135. https://doi.org/10.1016/j.tree.2008.10.008
Gerlich, H.S., Loboda, S., Simpson, G.L., Savage, J., Schmidt, N.M., Holmstrup, M., Høye, T.T., 2025. Species’ traits modulate rapid changes in flight time in high-Arctic muscid flies under climate change. Proc. Biol. Sci. 292, 20250970. https://doi.org/10.1098/rspb.2025.0970
Hefley, T.J., Broms, K.M., Brost, B.M., Buderman, F.E., Kay, S.L., Scharf, H.R., Tipton, J.R., Williams, P.J., Hooten, M.B., 2017. The basis function approach for modeling autocorrelation in ecological data. Ecology 98, 632–646. https://doi.org/10.1002/ecy.1674
Pedersen, E.J., Miller, D.L., Simpson, G.L., Ross, N., 2019. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ 7, e6876. https://doi.org/10.7717/peerj.6876