0
$\begingroup$

Suppose I measure 100 samples of a normal distribution and use them to compute a standard deviation.

Is there a way to compute +/- error bounds on my computed mean value for standard deviation if I want to know what the true standard deviation is with 95% confidence level had I measured 1 million samples instead of the original 100 samples?

Practical application: I characterize 100 units with the intent to create a max and a min specification for my product's standard deviation. Customer buys 1 million units and wants to know with 95% confidence what max and min values for standard deviation we guarantee. How can I create a specification for my product's datasheet that satisfies customer's interest when I measure less and the customer buy's more?

$\endgroup$
1
  • 1
    $\begingroup$ You may be asking this: We have i.i.d. $\{X_i\}$ that are $N(m,\sigma^2)$ with unknown $m, \sigma^2$. We define the sample mean $M_n=\frac{1}{n}\sum_{i=1}^nX_i$ and sample variance $V_n=\frac{1}{n-1}\sum_{i=1}^n(X_i-M_n)^2$. If so, you can find confidence intervals for $V_n$ (in comparison to true $\sigma^2$) using the fact $\frac{(n-1)V_n}{\sigma^2}$ is chi-square distribution with $n-1$ degrees of freedom. This is a standard statistics calculation. $\endgroup$ Commented Jun 30 at 23:13

1 Answer 1

2
$\begingroup$

I'm assuming you're computing a sample standard deviation: $$ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2 $$

Typically, this formulation of the sample standard deviation has the following distribution: $$ \frac{(n-1)s^2}{\sigma^2} \sim \chi^2(df = n-1) $$

So for the 95% confidence interval we get $$ \left(\frac{(n-1)s^2}{U}, \frac{(n-1)s^2}{L} \right) $$

where $U$ and $L$ are where the chi-squared tail probability is $0.025$. Using a calculator or stats package, with $n = 100$, you would get $U \approx 128.42$ and $L \approx 73.36$, so $$ CI(s^2) \approx \left(0.771s^2, 1.350s^2\right) $$

You can just take square roots at the end to find an approximate CI for $s$. Although it is technically slightly biased ($E(s^2) = \sigma^2$, but $E(s) < \sigma$), it shouldn't be an issue for $n = 100$.

Note--$\chi^2$ is an asymmetric distribution, so the confidence interval is asymmetric in turn.

$\endgroup$
3
  • $\begingroup$ If I'm interpreting this correctly, the CI you compute above corresponds to the n=100 sample population. What I'm trying to estimate is the corresponding CI for n=1000000 sample population, when all I have measured is n=100 samples. Is this possible, if we assume the distribution is normal? Or, it won't change based on population size (n) so the same CI applies? $\endgroup$ Commented Jul 2 at 4:38
  • $\begingroup$ I would look at $1,\!000,\!000$ (plus infinite counterfactuals that were never produced) as the entire population. Out of that population, you are sampling $n=100$ to study. You can only extract a CI from the ones you measure--you cannot just make $n = 1,\!000,\!000$ unless you measure all of those ones too. $\endgroup$ Commented Jul 2 at 4:47
  • $\begingroup$ Got it. Thank you - very useful. $\endgroup$ Commented Jul 2 at 4:53

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.