0
$\begingroup$

Below is a problem I made up and tried to solve. I am hoping somebody can help me finish it.

Problem:

A magical device generates a normally distributed random number with standard deviation of $1$ and unknown mean. We want to find the mean by generating numbers from the device. How many samples are needed so that the estimate of the mean is within $2\%$ of the actual mean with $95\%$ confidence? That is, we want $|\bar{Y} - \mu| \leq 0.02|\mu|$

Note: $$ \bar{Y} = \left( \dfrac{1}{n} \right) \sum_{i=1}^n Y_i $$

Answer:

$\bar{Y}$ is a normally distributed random variable. Let $\sigma_s$ be the standard deviation of $\bar{Y}$. Let $\sigma$ be the actual standard deviation. Let $n$ be the number of samples we take. \begin{align*} \sigma_s^2 &= \dfrac{ \sigma^2 }{ n } = \dfrac{1}{n} \\ \sigma_s &= n^{-\dfrac{1}{2} } \\ \end{align*} Now we define $Z$ to be: $$ Z = \dfrac{ \hat{Y} - \mu } { \sigma_s} = ( \bar{Y} - \mu ) n^{\dfrac{1}{2}} $$ Now $Z$ is a normal random variable with variance $1$ we want: $$ P(|Z| \leq |\mu|n^{0.5}) = 0.95 $$

Am I right so far? How do I finish the problem?

$\endgroup$
0

1 Answer 1

2
$\begingroup$

The problem with the way your question is posed is that your answer will be a function of the true mean $\mu$, because you stipulate that $\bar Y$ must be "within $2\%$ of the true mean with $95\%$ confidence. Consequently, if $\mu$ is small in magnitude, your tolerance for error is similarly small and your required sample size is huge; whereas if $\mu$ is sufficiently large, a single observation may suffice because the variance is fixed at $1$.

For instance, if $\mu = 10^{10^{10^{10}}}$, that is an absolutely huge number compared to the variance, meaning, a single observation would almost certainly be within $2\%$ of the true value.

At the other extreme, what if $\mu = 0$? Then it does not even make sense to speak of the percentage error, because it will always be infinity, as a result of division by $0$.

This dependence of the required sample size $n$ on an unknown parameter $\mu$ is highly undesirable in statistical practice, because it is effectively useless for the purpose of inference. If you don't know $\mu$, you cannot calculate $n$; if you knew $\mu$, you would have no need to infer its value through estimation.

A way to "fix" this question is to change the error criterion to a value that is not a percentage; e.g., "what is the required sample size such that the point estimate is within $0.02$ of the true value with $95\%$ confidence? In this way, the error is measured on the same scale as the observed data. So if $Y$ represents units of length, say centimeters, then an error of $0.02$ is equivalent to $0.2$ millimeters irrespective of whether $\mu$ is $10$ cm or $1000$ km.


I suppose it is worth discussing how to answer the question in the case where we want the estimate to be within $0.02$ of the true mean. This is equivalent to saying $$\Pr[|\bar Y - \mu| \le 0.02] \ge 0.95,$$ and with the assumption $\sigma^2 = 1$ along with the observation that $$Z = \frac{\bar Y - \mu}{\sigma/\sqrt{n}} \sim \operatorname{Normal}(0,1)$$ is a pivotal quantity, it follows that we require $$\Pr[|Z| \le 0.02 \sqrt{n}] \ge 0.95.$$ Equivalently, if $\Phi(z) = \Pr[Z \le z]$ is the cumulative distribution function for the standard normal distribution, $$\Phi(-0.02 \sqrt{n}) \le 0.025.$$ Thus $$\sqrt{n} \ge \frac{\Phi^{-1}(0.025)}{-0.02} = \frac{-1.95996}{-0.02} = 97.9982,$$ so we require $$n \ge \lceil (97.9982)^2 \rceil = \lceil 9603.65 \rceil = 9604$$ to achieve at least $95\%$ confidence.

$\endgroup$
3
  • $\begingroup$ What I missed is that if the $\mu$ is very large and $\sigma = 1$ all the values of the distribution are approximately the same. $\endgroup$ Commented Nov 13 at 12:35
  • $\begingroup$ I am thinking that the real problem with the question is that $\sigma$ is a known fixed value. Please comment. $\endgroup$ Commented Nov 13 at 12:58
  • 2
    $\begingroup$ @Bob $\sigma$ being known is a different problem. You could change the question so your estimator of the mean (sample average) has a $95\%$ confidence of being in $(\mu - 0.02 \, \sigma, \mu + 0.02 \, \sigma)$, with $\sigma$ known or unknown, and it suddenly becomes much easier as the necessary sample size then does not depend on $\sigma$. $\endgroup$ Commented Nov 13 at 17:19

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.