Whether to use GLM or GAM on Negative Binomial data with categorical and numeric predictor variables

Question

First ever question here so I apologize if I miss any appropriate information. I'm working on some ecological count data of different vegetation classifications (Oaks, pines, grasses, forbs, etc...) and attempting to determine how their distribution has been affected by variables related to past fire behavior. Independent variables I am investigating are: Aspect(continuous, transformed into degrees from North), Years since last fire (discrete numeric), Fire return interval (continuous, calculation of # of fires in given amt of time), Elevation (continuous), rdNBR of past fires (continuous measure of fire intensity), Geology (categorical factor w/ 4 levels), and Burn Category (categorical factor w/4 levels based off time since last burn [can be dropped from models, just an alternate way of measuring the effects]). I've already run GLM models for each of the response variables, in each case over-dispersion indicated utilizing Negative Binomial distributions. I've looked through some other discussions and can't exactly find the answer to my question, however I think a GAM might do a better job of accounting for the non-linear effect that geology has on the successional patterns I'm seeing for some of my dependent variables. I've seen instances of people using categorical variables in GAM's, and people using NB distributions, but no instances of the two being used concurrently. Is a GAM a viable model for this sort of analysis? My next step is determining global models for each DV and I'm curious if a GAM approach is warranted instead of or along with the GLMs I'm already doing. In the end my goal is to show the patterns of habitat variation across the landscape and determine how much past fire activity has influenced these patterns in concert with the typical biophysical variables that influence plant distributions. Thank you in advance and please let me know which additional info I can provide to better help yall answer my question.

Welcome to CV. In your question you say that geology is categorical with 4 levels. Given that, I'm not sure what a "nonlinear effect of geology" would be. // Also, another choice is to use splines on continuous IVs within NB regression. — Peter Flom
– Peter Flom, Commented Feb 28 at 18:32

EdM · Accepted Answer · 2025-02-28 20:15:18Z

Similar abbreviations like GAM (generalized additive model) and GLM (generalized linear model) can lead to confusion about what each of them means. Starting with the linear predictor in regression can help clarify how and why you can use both together.

In a standard linear regression model with outcome $y$, predictors $x_1$ and $x_2$, intercept $\beta_0$, regression coefficients $\beta_1$ and $\beta_2$, and random error $\epsilon$:

$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon ,$$

the sum on the right (less the error $\epsilon$) is called the linear predictor. That gives the predicted outcome for any combination of predictor values. That generalizes to additive contributions from multiple predictors and their corresponding coefficients.

GAM

A GAM allows for sums of non-linear functions of the predictor variables $x$ on the right side, equivalent to that linear predictor. Section 7.7 of An Introduction to Statistical Learning explains:

Generalized additive models (GAMs) provide a general framework for extending a standard linear model by allowing non-linear functions of each of the variables, while maintaining additivity.

A corresponding GAM would be:

$$y = \beta_0 + f_1(x_1) + f_2(x_2) + \epsilon,$$

where the shapes of the functions $f()$ might be determined as part of the modeling. The functions can be complicated, however, and estimated by methods that provide the desired smoothness and avoid overfitting. So long as the terms are additive, the right side (without the error $\epsilon$) is equivalent in practice to the linear predictor of a linear regression model.

For example: if you've ever done a log transformation of a predictor you can think of that as a type of GAM, even though you probably just thought of it as a standard linear regression. A regression spline, as suggested by Peter Flom in a comment, is used as an example of a GAM in Section 7.7.1 of An Introduction to Statistical Learning. Those simple types of GAM can be fit with standard linear regression routines, however, while more complicated GAMs require different computational methods.

GLM

A GLM models a function (the "link function") of the outcome $y$ with respect to the linear predictor. In a negative binomial model, the link function is the log (of the mean counts). Fitting a GLM requires more complicated computational methods than for standard linear regression.

Putting them together

Thus there's no problem in principle with having a GLM (with its nonlinear link function for the outcome) along with a GAM (with its sum of potentially nonlinear functions of predictors). The R mgcv package often used for GAMs allows for a negative binomial family in a GLM.

Stack Exchange Network

Whether to use GLM or GAM on Negative Binomial data with categorical and numeric predictor variables

1 Answer 1

Your Answer

Hot Network Questions

Whether to use GLM or GAM on Negative Binomial data with categorical and numeric predictor variables

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions