Calculating how many samples are needed to get the desired confidence interval

Question

Below is a problem I made up and tried to solve. I am hoping somebody can help me finish it.

Problem:

A magical device generates a normally distributed random number with standard deviation of $1$ and unknown mean. We want to find the mean by generating numbers from the device. How many samples are needed so that the estimate of the mean is within $2\%$ of the actual mean with $95\%$ confidence? That is, we want $|\bar{Y} - \mu| \leq 0.02|\mu|$

Note: $$ \bar{Y} = \left( \dfrac{1}{n} \right) \sum_{i=1}^n Y_i $$

Answer:

$\bar{Y}$ is a normally distributed random variable. Let $\sigma_s$ be the standard deviation of $\bar{Y}$. Let $\sigma$ be the actual standard deviation. Let $n$ be the number of samples we take. \begin{align*} \sigma_s^2 &= \dfrac{ \sigma^2 }{ n } = \dfrac{1}{n} \\ \sigma_s &= n^{-\dfrac{1}{2} } \\ \end{align*} Now we define $Z$ to be: $$ Z = \dfrac{ \hat{Y} - \mu } { \sigma_s} = ( \bar{Y} - \mu ) n^{\dfrac{1}{2}} $$ Now $Z$ is a normal random variable with variance $1$ we want: $$ P(|Z| \leq |\mu|n^{0.5}) = 0.95 $$

Am I right so far? How do I finish the problem?

heropup · Accepted Answer · 2025-11-14 02:12:19Z

The problem with the way your question is posed is that your answer will be a function of the true mean $\mu$, because you stipulate that $\bar Y$ must be "within $2\%$ of the true mean with $95\%$ confidence. Consequently, if $\mu$ is small in magnitude, your tolerance for error is similarly small and your required sample size is huge; whereas if $\mu$ is sufficiently large, a single observation may suffice because the variance is fixed at $1$.

For instance, if $\mu = 10^{10^{10^{10}}}$, that is an absolutely huge number compared to the variance, meaning, a single observation would almost certainly be within $2\%$ of the true value.

At the other extreme, what if $\mu = 0$? Then it does not even make sense to speak of the percentage error, because it will always be infinity, as a result of division by $0$.

This dependence of the required sample size $n$ on an unknown parameter $\mu$ is highly undesirable in statistical practice, because it is effectively useless for the purpose of inference. If you don't know $\mu$, you cannot calculate $n$; if you knew $\mu$, you would have no need to infer its value through estimation.

A way to "fix" this question is to change the error criterion to a value that is not a percentage; e.g., "what is the required sample size such that the point estimate is within $0.02$ of the true value with $95\%$ confidence? In this way, the error is measured on the same scale as the observed data. So if $Y$ represents units of length, say centimeters, then an error of $0.02$ is equivalent to $0.2$ millimeters irrespective of whether $\mu$ is $10$ cm or $1000$ km.

I suppose it is worth discussing how to answer the question in the case where we want the estimate to be within $0.02$ of the true mean. This is equivalent to saying $$\Pr[|\bar Y - \mu| \le 0.02] \ge 0.95,$$ and with the assumption $\sigma^2 = 1$ along with the observation that $$Z = \frac{\bar Y - \mu}{\sigma/\sqrt{n}} \sim \operatorname{Normal}(0,1)$$ is a pivotal quantity, it follows that we require $$\Pr[|Z| \le 0.02 \sqrt{n}] \ge 0.95.$$ Equivalently, if $\Phi(z) = \Pr[Z \le z]$ is the cumulative distribution function for the standard normal distribution, $$\Phi(-0.02 \sqrt{n}) \le 0.025.$$ Thus $$\sqrt{n} \ge \frac{\Phi^{-1}(0.025)}{-0.02} = \frac{-1.95996}{-0.02} = 97.9982,$$ so we require $$n \ge \lceil (97.9982)^2 \rceil = \lceil 9603.65 \rceil = 9604$$ to achieve at least $95\%$ confidence.

What I missed is that if the $\mu$ is very large and $\sigma = 1$ all the values of the distribution are approximately the same. — Bob
– Bob, Commented Nov 13 at 12:35
I am thinking that the real problem with the question is that $\sigma$ is a known fixed value. Please comment. — Bob
– Bob, Commented Nov 13 at 12:58
@Bob $\sigma$ being known is a different problem. You could change the question so your estimator of the mean (sample average) has a $95\%$ confidence of being in $(\mu - 0.02 \, \sigma, \mu + 0.02 \, \sigma)$, with $\sigma$ known or unknown, and it suddenly becomes much easier as the necessary sample size then does not depend on $\sigma$. — Henry
– Henry, Commented Nov 13 at 17:19

Stack Exchange Network

Calculating how many samples are needed to get the desired confidence interval

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Calculating how many samples are needed to get the desired confidence interval

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions