1
$\begingroup$

Suppose a population can be divided into 4 different groups (A, B, C, D) and you take the sample variance for a parameter of each group.

A proposed estimator for the population variance is $$s^2_A/n_A + s^2_B/n_B + s^2_C/n_C + s^2_D/n_D,$$

where $n_A, n_B, n_C$ and $n_D$ are the number of observations in each group.

In other words, the proposed estimator for the population variance is the average of the estimators of the conditional variance calculated before.

Is this population variance estimator correct?

$\endgroup$
10
  • $\begingroup$ What do you mean by a "correct" estimator? $\endgroup$ Commented Oct 24, 2022 at 5:49
  • $\begingroup$ Your "proposed estimator" is not the "the average of the estimators of the conditional variance calculated before". Nor is it clear whether your four groups are a random partition (in which case you might assume they have the same expectations) or not (in which case that might be unreasonable). Why can you not find the variance of the population exactly? Or are these samples rather than the whole population? $\endgroup$ Commented Oct 24, 2022 at 7:34
  • $\begingroup$ @Henry the groups are not a random partition, they represent different demographics inside of the whole population. They are samples, not the population. $\endgroup$ Commented Oct 24, 2022 at 8:03
  • $\begingroup$ There’s no such notion of a “correct” estimator. Do you mean to ask how to calculate the variance of all four groups pooled together given just the group sample sizes and variances? I have done this calculation before, and my approach also required sample means, but it can be done! // If you have the entire population, there is no estimating. You know, with certainty, everything about the population. Do you mean to say that your sample can be partitioned into ABCD? Clarification on what exactly you want to do will be extremely helpful. $\endgroup$ Commented Oct 24, 2022 at 17:01
  • $\begingroup$ Hi @Dave I think that "correct" means unbiased. So I think that to do it I would need first to know the formula for the variance of the population if I had the variance for each group, which I'm not sure is the weighted average of the sub-population variances. I do not have the populations, I just need to prove that the expectation of that proposed estimator is equal to the variance of the entire population. The s in the formula represent the sample variances for each sub-group. $\endgroup$ Commented Oct 24, 2022 at 17:08

1 Answer 1

1
$\begingroup$

If you do not have the sample means, you cannot back out the variance that you would have calculated if you did so when you had all of the observations and not just the summary statistics. Consider the following scenarios.

Scenario 1: $A=\{1,2,3\}$, $B=\{1,2,3\}$

Scenario 2: $A=\{1,2,3\}$, $B=\{7,8,9\}$

In both scenarios, the groups have the same variance. However, the pooling of all six numbers results in different variances. If all you have are $n_A$, $s^2_A$, $n_B$, and $s^2_B$, you have no way to distinguish between these two scenarios, even though $\{1,2,3,1,2,3\}$ and $\{1,2,3,7,8,9\}$ have different variances.

This is not to be confused with finding the common sample variance as is done in ANOVA. There, you do not need the sample means, but it also addresses a completely different question.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.