3
$\begingroup$

I'm taking a statistics class. The question has been graded, but I question the validity of the official answer. We are asked whether, based solely on the confidence intervals, the mean BMI differs for individuals with no money worries vs individuals with a lot of money worries. There are 178 individuals with no money worries, and 8 with a lot of money worries. The 95%CI of the lot of money worries completely encloses the 95%CI of the no money worries. On this basis, the official answer is that since the 95% CIs overlap, p is assumed to be >.05.

I am not so convinced that we can say that. The numbers for the two samples are: No worries: N = 178, mean = 30.7264, SE = .36042, 95% CI = 30.010151-31.4376, VAR = 23.122, SD = 4.8057, skew and kurtosis both < 1, outliers: 2, both on the high end but not massively so.

Lot of worries: N = 8, mean = 35.7514, SE = 3.88778, 95% CI = 26.5583 - 44.9446, VAR = 120.919, SD = 10.99630, skew < 1, kurtosis = 1.426, outliers: 1 possible - a single data point much higher than the other 7. Is it actually an outlier? With only 8 data points, who knows.

My general feeling is that the very tiny size of the lot of worries sample combined with the possible outlier among those 8 data points makes it impossible to meaningfully compare the two sample means purely on the basis of whether the 95% confidence intervals overlap. Am I wrong?

Edit: Thanks to everyone. Dave, thanks for the simulation. Jginestet, thanks for clarifying the glitch in that simulation. This was my first post on StackExchange and the system is not allowing me to change my initial acceptance of the simulation as the best answer. But I would like to clarify that I still accept it, only with the modifications suggested by Jginestet. In which case, the consensus does seem to be that I was indeed wrong and it is highly unlikely that p < .05 if the CIs are fully overlapped. Interesting...I think I need to go update my math skills so I can better understand why. And I am still somewhat skeptical that this sort of finding on such a small sample should have public health policy implications, even if it is statistically true. I would still like to see a more "normal" size data set for the lots of money category, to have more confidence that this is an actual reflection of population characteristics.

$\endgroup$
9
  • $\begingroup$ You are correct to be sceptical: see stats.stackexchange.com/questions/18215 for some analyses and commentary about the relationship between overlap of CIs and hypothesis testing. However, under what circumstances do you suppose a legitimate test of the difference in mean BMIs would reject the hypothesis of zero difference? Bear in mind that the standard error of the difference is greater than the individual standard errors in the two groups. For additional intuition, remember that any CI already accounts for the sample size. $\endgroup$ Commented May 8 at 14:59
  • $\begingroup$ Welcome to Cross Validated! Is your concern about the math of overlapping confidence intervals and the p-value of a t-test, or is your concern about whether or not a good methodology is being followed? The question strikes me as examining you on your knowledge of the former, while your concerns seem to be about the latter. $\endgroup$ Commented May 8 at 15:02
  • $\begingroup$ My concern is more about whether it makes sense to apply the math of overlapping confidence intervals and p-value of a t-test given the nature of the samples. I could be wrong, but intuition says that the CI of the small sample is high because of the small N and the possible 1/8 ratio of outliers in the data, and that trying to compare the CIs of it and the large sample is kind of comparing apples to oranges. To me, step one in analyzing data is always check if the test/methodology makes sense, and only then, if it does, apply it. $\endgroup$ Commented May 8 at 15:17
  • $\begingroup$ WHuber, I think we need more data in the lot of worries group to say anything meaningful about it. I don't trust data that is tiny in quantity and also seems to have 12.5% outliers. Especially for something like BMI, where it is very variable and the association with other variables is usually pretty weak. I think you need quite a lot more data in such a case to be able to really do this kind of comparison in a meaningful way. But maybe I'm wrong. $\endgroup$ Commented May 8 at 15:20
  • $\begingroup$ WHuber, I looked at that question you linked to, but the discussion seemed to be about cases where the sample sizes and variances were about equal, which is not the case here. So I am still wondering about this sort of case. $\endgroup$ Commented May 8 at 15:37

2 Answers 2

5
$\begingroup$

As the simulation below shows, t-based confidence intervals for groups of these sizes can have one interval contained within the other despite the fact that a two-sample t-test rejects the null hypothesis of mean equality, using $p<0.05$ as the threshold for rejection and $95\%$ confidence intervals.

set.seed(2025)
N0 <- 178
N1 <- 8
R <- 10000
contained_within <- rep(0, R)
reject_at_005 <- rep(0, R)
for (i in 1:R){
 
  both_conditions <- 0
 
  x0 <- runif(N0, 2, 4)
  x1 <- runif(N1, 0, 4)
 
  # Determine rejection at alpha = 0.05
  #
  p <- t.test(x0, x1, var.equal = T)$p.value
  if (p < 0.05){
    reject_at_005[i] <- 1
    both_conditions <- both_conditions + 1
  }
 
  # Determine 95% confidence intervals for each group
  #
  ci0 <- t.test(x0)$conf.int
  ci1 <- t.test(x1)$conf.int
 
  # Determine if ci0 is contained within ci1
  #
  if (
    ci1[1] < ci0[1]
    &
    ci1[2] > ci0[2]
  ){
    contained_within[i] <- 1
    both_conditions <- both_conditions + 1
  }
 
  # Determine if ci1 is contained within ci0
  #
  if (
    ci1[1] > ci0[1]
    &
    ci1[2] < ci0[2]
  ){
    contained_within[i] <- 1
    both_conditions <- both_conditions + 1
  }
 
  if (both_conditions > 1){
    print("Containment yet rejection!")
  }
}

mean(reject_at_005 * contained_within)
#
# In almost a third of the simulations, the p-value is below 0.05 despite one
# confidence interval being completely contained in the other

Consequently, the grader is mistaken to say that one $95\%$ confidence interval being contained in the other implies $p > 0.05$.$^{\dagger}$

Let's look at a sketch to get some intuition.

confidence intervals

The red confidence interval is totally contained within the black confidence interval. However, the black mean (center of the confidence interval) is outside of the red confidence interval, which suggests an incompatibility between the red mean and the black mean. A full two-sample t-test can formalize this incompatibility by explicitly testing if the difference in means is zero.

$^{\dagger}$I realize this is not the exact phrasing. However, the phrasing in the question seems to imply a belief that one $(1-\alpha)\times100\%$ confidence interval being in the other means rejection at the $\alpha$-level, which the simulation above demonstrates, through thousands of counterexamples, need not hold.

$\endgroup$
7
  • $\begingroup$ Thank you very much! Out of curiosity, and to help me understand why it is so, is the hand-waving explanation for this that the enveloping CI is only so big because N is so small? Or what is the underlying structural reason for this happening? $\endgroup$ Commented May 8 at 15:54
  • $\begingroup$ @Dana I have resolved an issue I found in my simulation. $/$$ I will make an edit later today (maybe tomorrow) to explain what is happening. $\endgroup$ Commented May 8 at 16:04
  • 1
    $\begingroup$ @Dave The result appears to relies on var.equal = T in the T-test, which completely contradicts the logic in the separate CI-calculation. As is, I'd say this is misleading. $\endgroup$ Commented May 8 at 19:39
  • 4
    $\begingroup$ @Dave, no we do not. There are probably exceptions, but they are rare. I woulod agree with whuber that, in case of full containment of 1 CI inside another, there is a high likelihood (hand waving here) of lack of rejection. $\endgroup$ Commented May 8 at 20:25
  • 1
    $\begingroup$ The illustration is good, but I find its interpretation exactly wrong because a proper comparison of the two groups requires considering the uncertainties in both groups. Because the red mean lies within the black interval, we ought to suspect there is little evidence (on the basis of the CIs alone) of a difference in group means. Indeed, that consideration ought to remind us that the correct method will compare the difference in means to the standard error of the difference, which can be no smaller than the either of the group standard errors. $\endgroup$ Commented May 9 at 15:36
4
$\begingroup$

Because the answer might depend on how the confidence intervals are constructed or even (conceivably) on the confidence level (it turns out it depends on both), let's begin with a general description.

The situation concerns two independent samples, a smaller group $X$ of $n_x$ observations with a mean of $\bar x$ and standard error $\hat\sigma_x$ and a larger group $Y$ of $n_y \ge n_x$ observations with a mean of $\bar y$ and standard error $\hat\sigma_y.$ Let the confidence intervals be based on positive multipliers $t_x$ and $t_y,$ respectively, in the sense that they are constructed as $[\bar x - t_x\hat\sigma_x, \bar x + t_x\hat\sigma_x]$ and similarly for the second group. To circumvent ridiculous applications, I will suppose both standard errors are nonzero.

Inclusion of the second confidence interval (for $Y$) within the first (for $X$) is equivalent to

$$|\hat x - \hat y| \le t_x\hat\sigma_x - t_y\hat\sigma_y.\tag{1}$$

This difference differs significantly from $0$ when it exceeds some positive multiple $t_{x-y}$ of its standard error. Independence of the samples implies the standard error of the difference is

$$\hat\sigma_{x-y} = \sqrt{\hat\sigma_x^2 + \hat\sigma_y^2}.$$

Consequently, a paradoxical situation would arise only when

$$t_{x-y}\hat\sigma_{x-y} \lt |\hat x - \hat y|.\tag{2}$$

Transitivity of the inequalities $(1)$ and $(2)$ yields

$$t_{x-y}\hat\sigma_{x-y} \lt t_x\hat\sigma_x - t_y\hat\sigma_y$$

and squaring both sides tells us

$$t_{x-y}^2(\hat\sigma_x^2 + \hat\sigma_y^2) \lt (t_x\hat\sigma_x - t_y\hat\sigma_y)^2.\tag{3}$$

Let's examine some particular cases.

  1. With Normal-approximation confidence intervals, no matter what the (common) confidence level $\alpha$ is, $t_x = t_y = t_{x-y} = Z_{\alpha/2},$ a Normal percentage point that is nonzero provided only $\alpha \lt 100\%,$ as is always the case. Inequality $(3)$ simplifies to $$0 \lt -\frac{2t_xt_y}{t_{x-y}^2}(Z_{\alpha/2}\hat\sigma_x\hat\sigma_y)^2,$$ an impossibility because all the terms on the right are nonzero.

  2. With Student t confidence intervals, the multiples $t_x$ etc. are percentage points of Student t distributions with $n_x-1,$ $n_y-1,$ and some other number $\nu_{x-y}$ degrees of freedom. (The latter depends on the type of t-test used.) $t_y \le t_x$ and $t_{x-y}\le t_x,$ making it at least mathematically possible for inequality $(3)$ to hold.

Because at this point it is clear our answer must depend on many assumptions about the specific test, about the values of the standard errors, as well as detailed knowledge of how percentage points of Student t distributions vary, we are in a position to appreciate the qualifier to the answer: "$p$ is assumed to be greater than 0.05." We now know why this nuance was necessary. Absent such specific knowledge, yet forced to provide a decision, the wise analyst will agree with this conclusion.

The bottom line is that you are right to be sceptical, but the official answer is correct as given.

$\endgroup$
11
  • $\begingroup$ My post isn’t a counterexample? $\endgroup$ Commented May 8 at 16:16
  • $\begingroup$ @Dave Assuming the simulation is correct, it is an example of situation (2). It is amenable to this analysis--no simulation is needed--because in using an equal-variance t-test, you know that $\nu_{x-y}=n_x+n_y-2$ and because you have Normal populations, you can compute the sampling distributions of the means and standard errors. As such I see no contradictions or counterexamples lurking. $\endgroup$ Commented May 8 at 16:21
  • $\begingroup$ Um...my variances are not equal, and the sample distributions aren't normal, either. $\endgroup$ Commented May 8 at 18:39
  • $\begingroup$ Right: but those are all the case in @Dave's simulation. $\endgroup$ Commented May 8 at 18:47
  • $\begingroup$ I don't quite understand how the answer can be correct as given if Dave's simulation is correct and the odds of p being < .05 are about 1 in 3 or 4. That's a pretty hefty chunk of the cases that the assumption would be wrong, so therefore we should not be making that assumption. I mean, isn't it a fundamental statistical rule of thumb that p has to be .05 or less, so in line with that sort of tolerance, we should not assume that overlapping CIs show that p > .05 unless the simulation returned that result about 95% of the time or more? $\endgroup$ Commented May 8 at 18:54

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.