I'm doing stats for medical chart review research. The binary outcomes vary in probability from less than 0.05 to greater than 0.5 depending on risk factors. For relatively more common outcomes like bronchopulmonary dysplasia (BPD), roughly >20% of very preterm neonates develop BPD. The continuous predictor is hours after birth at which first feed was given. The overall goal is to assess for explanatory associations between continuous explanatory variable "time (hours after birth) at which first enteral feed is given" and subsequent outcome variables i.e. necrotizing enterocolitis, sepsis, retinopathy of prematurity, and bronchopulmonary dysplasia (outcomes are composited with mortality to prevent survivorship bias). This assessment is made using multivariate logistic regression with covariate potential confounders. However, prior to the multivariate model, our goal is to visually describe the univariate relationship between time until first feed and outcomes.
Initially, I used a GAM to assess for linear relationship between continuous explanatory variable and adjusted log odds of outcome, because I read that this linearity is an assumption of logistic regression. The GAM confidence intervals were asymmetric about the curve, with more variance toward p of 0.5 and less toward extremes, which seemed consistent with my understanding of an equation I read: variance maximized when p = 0.5 as variance is proportional to p / (1-p). However, the GAM curve was not very descriptive of the empirical probability (see image below where GAM plot is bottom panel, noting how the non-linearity of univariate relationship was suppressed). My team then suggested I use LOESS to describe the relationship and variance.
So I made LOESS plots of empirical probability on y-axis by time until first feed on x-axis. In R, I'm using the geom_smooth(method = "loess") function to make a LOESS plot and display local/pointwise 95% CIs around the smoothed probability.
I then read that the geom_smooth built-in 95% confidence interval setting (se = TRUE) uses a global measure of variance (with local leverage), which I thought might be invalid given that the outcome is binary (and so variance would be dependent on probability, which varies by x). This initial LOESS plot example is Panel B of attached image. Because I was concerned it was invalid, I then plotted with bootstrapped LOESS CIs instead (Panel D of attached image). However, what I noticed is that the bootstrapped CIs and the analytical CIs (from se=TRUE) are very similar to each other. Are either of these approaches valid?
Edit: Per suggestion of EdM in comments, I made a plot with splines as well. I used univariate logistic regression with restricted cubic splines, boundaries at 5th and 95th and internal splines at 35th and 65th percentile. This RCS plot is also attached in the right panel of image below. This plot appears to preserve the non-linearity of the univariate relationship (so it's more descriptive) while also having confidence intervals that are asymmetric (greater toward 0.5 and lesser toward 0 and 1).
When I looked at stats paper (doi: 10.1214/aos/1015362189), the methods discussed were totally different (Wilson, Jeffreys, Agresti-Coull, Clopper-Pearson), which I don't currently understand. Not sure if these are applicable to finding pointwise CIs. My question is if using the LOESS built-in analytical or LOESS bootstrapped CIs are valid at all, or if I need to learn about other methods in order to get valid CIs for probability of binary outcome.
