1
$\begingroup$

Unbiased Variance Estimator

Let $x_1 , \ldots, x_N$ be iid sampled from X. Let Y(N) denote the N-mean estimator given by

$$ Y(N) = \frac{1}{N} \sum_{i=1}^N x_i $$

Let v(N) denote the unbiased N-variance estimator

$$ v(N) = \frac{1}{N-1} \sum_{i=1}^N (x_i - Y(N))^2 $$

Unbiased Variance Estimator with reused samples

I draw d additional samples $x_{N+1} , \ldots, x_{N+d}$ that are iid sampled from X. Similarly, let $Y(N+d)$ denote the $(N+d)$-mean estimator (with reused samples).

$$ Y(N+d) = \frac{1}{N+d} \sum_{i=1}^{(N+d)} x_i$$

Let v(N+d) denote the unbiased (N+d)-variance estimator with reused samples,

$$v(N+d) = \frac{1}{N+d-1} \sum_{i=1}^{(N+d)} (x_i - Y(N+d))^2$$

Covariance of the Variance Estimators with shared samples

My question is, what is the Covariance of these two estimators? $$\mathrm{Cov}[ v(N), v(N+d) ] = ??? $$

From E. Benhamou. “A few properties of sample variance" equation 10, I know that the variance of the estimators is given by $$\mathrm{Var}[ v(N) ] = \frac{1}{N}\left( m_4 - \frac{N-3}{N-1} m_2^2 \right)$$ where $m_2$ and $m_4$ are the second-order and fourth-order moments of X. I tried to use the fact that v is a U-statistic but I ended up with a quadruple summation that I wasn't sure how to simplify. Is this a standard result? Can I pull a citation from somewhere or is there something that I can draw from?

$\endgroup$
5
  • $\begingroup$ it's not difficult but very calculational. $\endgroup$ Commented 2 days ago
  • $\begingroup$ @NN2 Is there a standard reference that I can use for this particular result? I would rather not spend a massive amount of time deriving something which is well known. $\endgroup$ Commented 2 days ago
  • $\begingroup$ I don't have. I calculated something like this by myself in my work several years ago. You need to break down $Y(N)$ and rewrite $v(N)$ in term of $(x_i)_i$ only. $\endgroup$ Commented 2 days ago
  • $\begingroup$ I think that the approach with U-statistics can work. We can decompose the centered versions of $v(N)$ and $v(N+d)$ as sums of pairwise orthogonal random variables. I will try to find time to write it down these days. $\endgroup$ Commented 2 days ago
  • $\begingroup$ @NN2 thank you for the response. I took the time and I did the full breakdown. $\endgroup$ Commented yesterday

1 Answer 1

0
$\begingroup$

We use the following formula: \begin{equation} \begin{aligned} \left( N+d-1 \right)v(N+d) = & \left( N-1 \right)v(N) +\sum_{i=N+1}^{N+d} \left( x_i - \frac{1}{d}\sum_{j=N+1}^{N+d} x_{j} \right)^2 \\ &+\frac{d N}{N+d}\left( Y(N)-\frac{1}{d}\sum_{j=N+1}^{N+d} x_{j} \right)^2 . \end{aligned} \end{equation}

Taking covariance with respect to (v(N)) and using bilinearity and independence: \begin{equation} \begin{aligned} \left( N+d-1 \right)\mathrm{Cov}[v(N+d),v(N)] = \left( N-1 \right)\mathrm{Var}[v(N)] +\frac{dN}{N+d}\mathrm{Cov}\left[ \left( Y(N)-\frac{1}{d}\sum_{j=N+1}^{N+d} x_{j} \right)^2, v(N) \right]. \end{aligned} \end{equation}

Expanding the square and using independence again: \begin{equation} \begin{aligned} \left( N+d-1 \right)\mathrm{Cov}[v(N+d),v(N)] = \left( N-1 \right)\mathrm{Var}[v(N)] \\+\frac{dN}{N+d}\left( \mathrm{Cov}[Y^2(N),v(N)] - 2\mu\,\mathrm{Cov}[Y(N),v(N)] \right). \end{aligned} \end{equation}

We isolate the covariance terms: \begin{equation} \begin{aligned} \mathrm{Cov}[Y^2(N),v(N)] &= \mathbb{E}[Y^2(N)v(N)] - \mathbb{E}[Y^2(N)]\mathbb{E}[v(N)] \\ &= \mathbb{E}[Y^2(N)v(N)] - \left( \frac{\sigma^2}{N}+\mu^2 \right)\sigma^2, \\ \mathrm{Cov}[Y(N),v(N)] &= \mathbb{E}[Y(N)v(N)] - \mathbb{E}[Y(N)]\mathbb{E}[v(N)] \\ &= \mathbb{E}[Y(N)v(N)] - \mu\sigma^2 . \end{aligned} \end{equation}

Substitute into the covariance expression: \begin{equation} \begin{aligned} \left( N+d-1 \right)\mathrm{Cov}[v(N+d),v(N)] &= \left( N-1 \right)\mathrm{Var}[v(N)] \\ &\quad+\frac{dN}{N+d} \left( \mathbb{E}[Y^2(N)v(N)]-2\mu \mathbb{E}[Y(N)v(N)] -\frac{\sigma^4}{N}+\mu^2\sigma^2 \right). \end{aligned} \end{equation}

Observe that \begin{equation} \mathbb{E}[Y^2(N)v(N)]-2\mu \mathbb{E}[Y(N)v(N)] = \mathbb{E}[(Y(N)-\mu)^2 v(N)] - \mu^2\sigma^2. \end{equation}

Thus \begin{equation} \begin{aligned} \left( N+d-1 \right)\mathrm{Cov}[v(N+d),v(N)] = \left( N-1 \right)\mathrm{Var}[v(N)] +\frac{dN}{N+d}\left( \mathbb{E}[(Y(N)-\mu)^2v(N)] - \frac{\sigma^4}{N} \right). \end{aligned} \end{equation}

Using the variance–mean decomposition: \begin{equation} (N-1)v(N) = \sum_{i=1}^N (x_i-Y(N))^2 = \sum_{i=1}^N ((x_i-\mu)-(Y(N)-\mu))^2, \end{equation} so \begin{equation} v(N)=\frac{1}{N-1}\sum_{i=1}^N (x_i-\mu)^2 -\frac{N}{N-1}(Y(N)-\mu)^2. \end{equation}

Therefore, \begin{equation} \mathbb{E}[(Y(N)-\mu)^2 v(N)] = \frac{1}{N-1}\mathbb{E}\left[(Y(N)-\mu)^2\sum_{i=1}^N (x_i-\mu)^2\right] -\frac{N}{N-1}\mathbb{E}[(Y(N)-\mu)^4]. \end{equation}

Compute the first expectation term: \begin{equation} \begin{aligned} \frac{1}{N-1}\mathbb{E}\left[(Y(N)-\mu)^2\sum_{i=1}^N (x_i-\mu)^2\right] &= \frac{1}{N^2(N-1)} \mathbb{E}\left[ \left( \sum_{j=1}^N (x_j-\mu) \right)^2 \sum_{i=1}^N (x_i-\mu)^2 \right]. \end{aligned} \end{equation}

Expanding the quadratic: \begin{equation} \begin{split} \left( \sum_{j=1}^N (x_j-\mu) \right)^2\sum_{i=1}^N (x_i-\mu)^2 &= \sum_{i=1}^N (x_i-\mu)^4 + \sum_{i\neq j}(x_i-\mu)^2(x_j-\mu)^2 \\ &\quad + 2\sum_{i\neq j}(x_i-\mu)^3(x_j-\mu) + 2\!\!\sum_{\substack{i\neq j \\ j\neq k \\ i\neq k}}\!(x_i-\mu)^2(x_j-\mu)(x_k-\mu). \end{split} \end{equation}

Since (\mathbb{E}[x_i-\mu]=0), cross-terms vanish, so \begin{equation} \begin{split} \mathbb{E}\left[\left( \sum_{j=1}^N (x_j-\mu) \right)^2\sum_{i=1}^N (x_i-\mu)^2\right] = N\mu_4 + N(N-1)\left( \sigma^2 \right)^2 . \end{split} \end{equation}

Next, expand the quartic term: \begin{equation} (Y(N)-\mu)^4 = \frac{1}{N^4}\sum_{i=1}^N(x_i-\mu)^4 +\frac{4}{N^4}\sum_{i<j}(x_i-\mu)^3(x_j-\mu) +\frac{6}{N^4}\sum_{i<j}(x_i-\mu)^2(x_j-\mu)^2. \end{equation}

Hence, \begin{equation} \mathbb{E}[(Y(N)-\mu)^4] = \frac{1}{N^3}\left( \mu_4 +3(N-1)\sigma^4 \right). \end{equation}

Thus, \begin{equation} \begin{split} \mathbb{E}[(Y(N)-\mu)^2v(N)] &= \frac{1}{N-1}\left( N\mu_4 + N(N-1)\sigma^4 \right) -\frac{N}{N-1}\left( \frac{\mu_4 +3(N-1)\sigma^4}{N^3} \right) \\ &= \frac{N^2+N+1}{N^2}\mu_4 + \frac{N^3-3}{N^2}\sigma^4. \end{split} \end{equation}

Finally, \begin{equation} \begin{aligned} \left( N+d-1 \right)\mathrm{Cov}[v(N+d),v(N)] &= \left( N-1 \right)\mathrm{Var}[v(N)] \\ &\quad + \frac{d}{N(N+d)} \left( (N^2+N+1)\mu_4 + (N^3-N-3)\sigma^4 \right). \end{aligned} \end{equation}

Using the variance–estimator expectation: \begin{equation} \begin{aligned} \left( N+d-1 \right)\mathrm{Cov}[v(N+d),v(N)] &= \left( N-1 \right)\left( \frac{1}{N}\left( \mu_4 - \frac{N-3}{N-1}\sigma^4 \right) \right) \\ &\quad+\frac{d}{N(N+d)} \left( (N^2+N+1)\mu_4 + (N^3-N-3)\sigma^4 \right) \\ &= \frac{(d+1)N + 2d -1}{N+d}\mu_4 + \frac{dN^2 - N - 2d + 3}{N+d}\sigma^4 . \end{aligned} \end{equation}

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.