I'm fitting a negative-binomial model for count data in a repeated measures setting
library(GLMMadaptive)
library(ggplot2)
dat <- structure(list(subjectId = c("1", "1", "2", "2",
"3", "3", "4", "4", "5", "5", "6",
"6", "7", "7", "8", "8", "9", "9",
"10", "10", "11", "11", "12", "12", "13",
"13", "14", "14", "15", "15", "16", "16",
"17", "17"), timepoint = structure(c(1L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L), levels = c("pre",
"post"), class = "factor"), count = c(3L, 0L, 8L, 0L, 1L, 0L,
0L, 1L, 6L, 5L, 0L, 8L, 0L, 6L, 1L, 0L, 0L, 1L, 0L, 3L, 0L, 0L,
4L, 2L, 39L, 0L, 0L, 10L, 28L, 3L, 6L, 0L, 3L, 0L), binary_factor2 = c("FALSE",
"FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE",
"FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE",
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE",
"FALSE", "FALSE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE",
"TRUE", "TRUE", "FALSE", "FALSE"), binary_factor1 = c("FALSE",
"FALSE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "FALSE",
"FALSE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE", "TRUE",
"TRUE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "FALSE",
"FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE",
"FALSE", "FALSE"), offset = c(90, 90, 90, 90, 90, 90, 90, 90,
90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90,
90, 90, 90, 90, 78, 90, 90, 90, 90, 90)), row.names = c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 11L, 12L, 13L, 14L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L), class = "data.frame")
mod <- mixed_model(fixed = count ~ timepoint +
binary_factor1 +
binary_factor2 +
offset(log(offset)),
random = ~ 1 | subjectId, data = dat,
family = negative.binomial())
Count is the number of events that occurred within our sampling window. Timepoint is pre vs. post intervention. binary_factor1 and 2 are binary covariates. offset is the window in which we measured, for almost all observations that 90 months, for only 1 observation it was 78.
It returns the following estimates and 95% CI's.
> summary(mod)
Call:
mixed_model(fixed = count ~ timepoint + binary_factor1 + binary_factor2 +
offset(log(offset)), random = ~1 | subjectId, data = dat,
family = negative.binomial())
Data Descriptives:
Number of Observations: 34
Number of Groups: 17
Model:
family: negative binomial
link: log
Fit statistics:
log.Lik AIC BIC
-69.81973 151.6395 156.6387
Random effects covariance matrix:
StdDev
(Intercept) 0.3208524
Fixed effects:
Estimate Std.Err z-value p-value
(Intercept) -2.8701 0.5320 -5.3953 < 1e-04
timepointpost -1.4944 0.6803 -2.1966 0.028048
binary_factor1TRUE -1.3824 0.5837 -2.3683 0.017872
binary_factor2TRUE 1.4035 0.5508 2.5480 0.010835
log(dispersion) parameter:
Estimate Std.Err
-0.3897 0.4709
Integration:
method: adaptive Gauss-Hermite quadrature rule
quadrature points: 11
Optimization:
method: hybrid EM and quasi-Newton
converged: TRUE
> exp(confint(mod))
2.5 % Estimate 97.5 %
(Intercept) 0.01998565 0.05669246 0.1608171
timepointpost 0.05914144 0.22438070 0.8512931
binary_factor1TRUE 0.07994693 0.25098662 0.7879513
binary_factor2TRUE 1.38248141 4.06924187 11.9775422
This all seems reasonable so far. As I read it, we expect that at timepointpost, the rate will be 0.224 (22.4%) of timepoint pre, controlling for covariates.
I then calculate the marginal coefficients and exponentiate to get the following
mod_marg_coef <- marginal_coefs(mod, std_errors = TRUE, cores = 6)
> mod_marg_coef
Estimate Std.Err z-value p-value
(Intercept) 1.6819 12.2269 0.1376 0.890589
timepointpost -1.5028 0.6998 -2.1475 0.031752
binary_factor1TRUE -1.3769 0.8083 -1.7035 0.088477
binary_factor2TRUE 1.3967 0.8207 1.7019 0.088783
The 95% CIs of the marginal coefficients seem unreasonable for Intercept and I'm worried about the implications for the rest of the model. Is this a problem with my procedure or my model/data?
> exp(confint(mod_marg_coef))
2.5 % 97.5 %
(Intercept) 2.103310e-10 1.374028e+11
timepointpost 5.644968e-02 8.769968e-01
binary_factor1TRUE 5.176358e-02 1.230357e+00
binary_factor2TRUE 8.091053e-01 2.019021e+01
I know there's an outlier in my data, so I hypothesize that's the cause of my unreasonable Std. Error and so I attempted to recalculate using robust standard errors. It only made things worse.
mod_marg_coef <- marginal_coefs(mod, std_errors = TRUE, cores = 6, sandwich = T)
> mod_marg_coef
Estimate Std.Err z-value p-value
(Intercept) 1.6819 31.9858 0.0526 0.95806
timepointpost -1.5028 0.9237 -1.6270 0.10374
binary_factor1TRUE -1.3769 2.2137 -0.6220 0.53395
binary_factor2TRUE 1.3967 1.7913 0.7797 0.43557
> exp(confint(mod_marg_coef))
2.5 % 97.5 %
(Intercept) 3.192320e-27 9.053001e+27
timepointpost 3.639881e-02 1.360105e+00
binary_factor1TRUE 3.294018e-03 1.933435e+01
binary_factor2TRUE 1.207237e-01 1.353174e+02
GLMMadaptive v 0.9-7