How to calculate a 95% CI on negative-binomial marginal coefficients? Strange results with GLMMadaptive::confint and GLMMadaptive::marginal_coef

Question

I'm fitting a negative-binomial model for count data in a repeated measures setting

library(GLMMadaptive)
library(ggplot2)

dat <- structure(list(subjectId = c("1", "1", "2", "2", 
"3", "3", "4", "4", "5", "5", "6", 
"6", "7", "7", "8", "8", "9", "9", 
"10", "10", "11", "11", "12", "12", "13", 
"13", "14", "14", "15", "15", "16", "16", 
"17", "17"), timepoint = structure(c(1L, 2L, 1L, 2L, 2L, 
1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L), levels = c("pre", 
"post"), class = "factor"), count = c(3L, 0L, 8L, 0L, 1L, 0L, 
0L, 1L, 6L, 5L, 0L, 8L, 0L, 6L, 1L, 0L, 0L, 1L, 0L, 3L, 0L, 0L, 
4L, 2L, 39L, 0L, 0L, 10L, 28L, 3L, 6L, 0L, 3L, 0L), binary_factor2 = c("FALSE", 
"FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", 
"TRUE", "TRUE", "FALSE", "FALSE"), binary_factor1 = c("FALSE", 
"FALSE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "FALSE", 
"FALSE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", 
"TRUE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", 
"FALSE", "FALSE"), offset = c(90, 90, 90, 90, 90, 90, 90, 90, 
90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 
90, 90, 90, 90, 78, 90, 90, 90, 90, 90)), row.names = c(1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 11L, 12L, 13L, 14L, 17L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 
34L, 35L, 36L, 37L, 38L), class = "data.frame")


mod <- mixed_model(fixed = count ~ timepoint + 
                     binary_factor1 + 
                     binary_factor2 + 
                     offset(log(offset)), 
                   random = ~ 1 | subjectId, data = dat,
                  family = negative.binomial())

Count is the number of events that occurred within our sampling window. Timepoint is pre vs. post intervention. binary_factor1 and 2 are binary covariates. offset is the window in which we measured, for almost all observations that 90 months, for only 1 observation it was 78.

It returns the following estimates and 95% CI's.

> summary(mod)

Call:
mixed_model(fixed = count ~ timepoint + binary_factor1 + binary_factor2 + 
    offset(log(offset)), random = ~1 | subjectId, data = dat, 
    family = negative.binomial())

Data Descriptives:
Number of Observations: 34
Number of Groups: 17 

Model:
 family: negative binomial
 link: log 

Fit statistics:
   log.Lik      AIC      BIC
 -69.81973 151.6395 156.6387

Random effects covariance matrix:
               StdDev
(Intercept) 0.3208524

Fixed effects:
                   Estimate Std.Err z-value  p-value
(Intercept)         -2.8701  0.5320 -5.3953  < 1e-04
timepointpost       -1.4944  0.6803 -2.1966 0.028048
binary_factor1TRUE  -1.3824  0.5837 -2.3683 0.017872
binary_factor2TRUE   1.4035  0.5508  2.5480 0.010835

log(dispersion) parameter:
  Estimate Std.Err
   -0.3897  0.4709

Integration:
method: adaptive Gauss-Hermite quadrature rule
quadrature points: 11

Optimization:
method: hybrid EM and quasi-Newton
converged: TRUE 


> exp(confint(mod))
                        2.5 %   Estimate     97.5 %
(Intercept)        0.01998565 0.05669246  0.1608171
timepointpost      0.05914144 0.22438070  0.8512931
binary_factor1TRUE 0.07994693 0.25098662  0.7879513
binary_factor2TRUE 1.38248141 4.06924187 11.9775422

This all seems reasonable so far. As I read it, we expect that at timepointpost, the rate will be 0.224 (22.4%) of timepoint pre, controlling for covariates.

I then calculate the marginal coefficients and exponentiate to get the following

mod_marg_coef <- marginal_coefs(mod, std_errors = TRUE, cores = 6)

> mod_marg_coef
                   Estimate Std.Err z-value  p-value
(Intercept)          1.6819 12.2269  0.1376 0.890589
timepointpost       -1.5028  0.6998 -2.1475 0.031752
binary_factor1TRUE  -1.3769  0.8083 -1.7035 0.088477
binary_factor2TRUE   1.3967  0.8207  1.7019 0.088783

The 95% CIs of the marginal coefficients seem unreasonable for Intercept and I'm worried about the implications for the rest of the model. Is this a problem with my procedure or my model/data?

> exp(confint(mod_marg_coef))
                          2.5 %       97.5 %
(Intercept)        2.103310e-10 1.374028e+11
timepointpost      5.644968e-02 8.769968e-01
binary_factor1TRUE 5.176358e-02 1.230357e+00
binary_factor2TRUE 8.091053e-01 2.019021e+01

I know there's an outlier in my data, so I hypothesize that's the cause of my unreasonable Std. Error and so I attempted to recalculate using robust standard errors. It only made things worse.

mod_marg_coef <- marginal_coefs(mod, std_errors = TRUE, cores = 6, sandwich = T)


> mod_marg_coef
                   Estimate Std.Err z-value p-value
(Intercept)          1.6819 31.9858  0.0526 0.95806
timepointpost       -1.5028  0.9237 -1.6270 0.10374
binary_factor1TRUE  -1.3769  2.2137 -0.6220 0.53395
binary_factor2TRUE   1.3967  1.7913  0.7797 0.43557

> exp(confint(mod_marg_coef))
                          2.5 %       97.5 %
(Intercept)        3.192320e-27 9.053001e+27
timepointpost      3.639881e-02 1.360105e+00
binary_factor1TRUE 3.294018e-03 1.933435e+01
binary_factor2TRUE 1.207237e-01 1.353174e+02

GLMMadaptive v 0.9-7

Ben Bolker · Accepted Answer · 2025-03-10 14:55:31Z

It's hard to know exactly what's going on, but it does seem to be related to the extreme point. I drew this picture:

ggplot(dat, aes(timepoint, count/offset, col = interaction(binary_factor1, binary_factor2))) +
    stat_sum()

I first tried removing both of what appeared to be outliers (count/offset > 0.3), but then fell back to only removing the largest value: the resulting confidence intervals on the count/multiplicative scale are large, but no longer ridiculous.

dat3 <- subset(dat, count/offset < 0.4)
mod3 <- update(mod, data = dat3)
mod_marg_coef3 <- marginal_coefs(mod3, std_errors = TRUE, cores = 6)
exp(confint(mod_marg_coef3))

                        2.5 %      97.5 %
(Intercept)        0.21256773 126.1406033
timepointpost      0.05773867   0.9014943
binary_factor1TRUE 0.08362403   0.9986908
binary_factor2TRUE 1.01209920  11.4398777

I don't know exactly why this is happening. The uncertainty in the random effect variance is relatively large (sqrt(diag(vcov(mod)))["D_11"] = 1.96, this is on the log-SD scale, meaning the range of the random-effects SD is approximately exp(log(sqrt(c(mod$D))) + c(-2, 2)*1.96) = (0.006, 16.7), but I don't immediately see how this could be screwing up the intercept CI ...

Here's an example from ?GLMMadaptive::marginal_coefs, based on simulated data (so we know that the model is correct):

Conditional (original) coefficients:

printCoefmat(coef(summary(fm1)))
              Estimate    Std.Err  z-value p-value    
(Intercept) -2.8752932  0.2246091 -12.8013 < 2e-16 ***
year         0.2041448  0.0087344  23.3726 < 2e-16 ***
group1      -0.6486935  0.3172350  -2.0448 0.04087 *  
year:group1 -0.0060448  0.0121626  -0.4970 0.61919

Marginal coefficients:

printCoefmat(m$value$coef_table)
              Estimate    Std.Err  z-value p-value    
(Intercept) -1.3588751  0.1131908 -12.0052 < 2e-16 ***
year         0.0965837  0.0049472  19.5229 < 2e-16 ***
group1      -0.3297952  0.1593108  -2.0701 0.03844 *  
year:group1 -0.0016855  0.0060295  -0.2795 0.77983

All of the coefficient values changed in this example. In your example the time point coefficient stayed the same (at least up to the first 4 decimal places), but I think that staying the same is actually more unusual than changing ... (I don't know exactly why this coefficient stays (at least approximately) the same, if I had to guess I would say that time point is probably balanced across groups in ways that the other covariates aren't ...

The only part of your results that looks surprising to me is the huge standard error on the marginal coefficient for the intercept (leading to ridiculously large confidence intervals). Without more information I can't tell why that's happening ...

Stack Exchange Network

How to calculate a 95% CI on negative-binomial marginal coefficients? Strange results with GLMMadaptive::confint and GLMMadaptive::marginal_coef

1 Answer 1

Your Answer

Hot Network Questions

How to calculate a 95% CI on negative-binomial marginal coefficients? Strange results with GLMMadaptive::confint and GLMMadaptive::marginal_coef

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions