2
$\begingroup$

I want to know can we approximate the covariance matrix of a random vector by making use of a probability limit.

Define the linear regression model in matrix form as $$ \mathbf{Y} = \mathbf{X} \beta + \varepsilon, $$ where the variance of $\varepsilon$ is $\sigma$.

I am interested in approximating $E[\text{Cov}[A|\mathbf{X}]]$ defined by

$$ E[\text{Cov}[\hat \beta|\mathbf{X}]] = E\bigg[\frac{\sigma^2}{n} \bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1}\bigg] = \frac{\sigma^2}{n} E\bigg[\bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1}\bigg]. $$

The probability limit of $\mathbf{X}^T\mathbf{X}/n$ is $$ \text{plim}_{n\to \infty} \bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg) = Q, $$ where $Q$ is a constant positive definite matrix (see Econometric Analysis by William Greene, eq. 4-19). So the probability limit of the inverse $(\mathbf{X}^T\mathbf{X}/n)^{-1}$ is $$ \text{plim}_{n\to \infty} \bigg(\frac{\mathbf{X}^T\mathbf{X}}{n}\bigg)^{-1} = Q^{-1}. $$

For large $n$, I am interested in approximating $E[\text{Cov}[\hat \beta|\mathbf{X}]]$ by using the probability limit, that is, saying something like $$ E[\text{Cov}[\hat \beta|\mathbf{X}]] \approx \frac{\sigma^2}{n} Q^{-1}, \quad \quad \text{or} \quad \quad E[\text{Cov}[\hat \beta|\mathbf{X}]] \sim \frac{\sigma^2}{n} Q^{-1}. $$ I have various questions regarding the validity of doing this.

What kind of error are we making if we can do this? Is there a way to account for the error? Is this a situation where we have an approximation that 'holds with high probability'? If we can indeed make this approximation, how do we rigorously state it mathematically (precisely what does $\approx$ or $\sim$ signify)?

$\endgroup$
2
  • $\begingroup$ What is $A$? It appears to have the same covariance matrix as $\beta$, under the usual assumptions. $\endgroup$ Commented Dec 4, 2020 at 13:38
  • $\begingroup$ $A$ is actually the estimated linear regression coefficients $\hat \beta$ (see here). $\endgroup$ Commented Dec 4, 2020 at 14:16

1 Answer 1

3
$\begingroup$

In "standard linear regression" with strict exogeneity, $E(\varepsilon \mid \mathbf X) = 0$, the OP wants to approximate (pursuing a theoretical result) the unconditional variance of $\hat \beta$ by using the probability limit of the the moment matrix.

By the Law of Total Variance and the fact that $E(\hat \beta \mid \mathbf X) = \beta$, we have that the unconditional variance is

$${\rm V}(\hat \beta) = \sigma^2 \cdot E\Big[(\mathbf X' \mathbf X)^{-1}\Big] = \frac{\sigma^2 }{n}\cdot E\Big[(n^{-1}\mathbf X' \mathbf X)^{-1}\Big]$$

We approximate this by

$${\rm V}(\hat \beta) \approx \frac{\sigma^2}{n} \cdot Q^{-1},$$

where

$$Q = {\rm plim}\left(n^{-1}\mathbf X' \mathbf X\right) = E(\mathbf x \mathbf x')$$

where $\mathbf x$ is the typical row vector of $\mathbf X$ and is used because at the limit the matrix $X$ has infinite row dimension, so it would be inappropriate to use it as the result of a limiting expression.

In words, instead of the expected value of the inverse, we use the inverse of the expected value.

The approximation error is

$$\delta(n) =(\sigma^2 /n) \cdot \Big[E[(n^{-1}\mathbf X'\mathbf X)^{-1}] - [E(\mathbf x'\mathbf x)]^{-1}\Big].$$

We have that $(n^{-1}\mathbf X'\mathbf X)^{-1} \longrightarrow_p [E(\mathbf x'\mathbf x)]^{-1}$, so

$$E[(n^{-1}\mathbf X'\mathbf X)^{-1}] - [E(\mathbf x'\mathbf x)]^{-1} \longrightarrow E[(E(\mathbf x'\mathbf x))^{-1}] - [E(\mathbf x'\mathbf x)]^{-1} = 0, $$

so this expression is $o(1)$. Also, $(\sigma^2/n) = O(1/n)$. Therefore,

$$\delta(n) = O(1/n)\cdot o(1) = o(1\cdot 1/n) = o(1/n).$$

So the approximation error goes to zero faster than $n$ goes to infinity.

UPDATE
Can we improve on the $o_p(1) / o(1)$ rate of convergence of

$$E[(n^{-1}\mathbf X'\mathbf X)^{-1}] - [E(\mathbf x'\mathbf x)]^{-1},\;\;\; ?$$

Apparently, the OP needs that. Let's see.

The OP mentioned a remark in Bruce Hansen's Econometrics textbook, about the OLS estimator having a faster convergence rate than $o_p(1)$. Hansen derives this after it obtains the rate of scaling needed for the asymptotic distribution. And since this latter is $O_p(n^{-1/2})$, it follows that multiplying the estimator $\hat \beta_n - \beta$ by something larger than unity ($n^0$) but lower than $n^{1/2}$ will not hinge his journey towards zero.

To clear the eye, we are examining the rate of convergence of

$$E(h_n) - c,\;\;\; c\; {\rm =\;constant}, \;\;\; h_n = O_p(1), \;h_n - c \to_p 0.$$

Now, to apply the Hansen approach, we would need to be able to say something about the distribution (if it exists) of $$n^{\delta} (h_n - c).$$

If we can prove that, for some $\delta >0$ the above converges in distribution, then we can apply the logic of Hansen, and argue that $\exists \, \gamma, 0<\gamma < \delta$ for which

$$n^{\gamma}(h_n - c) \to_p 0$$

and so

$$(h_n - c) = o_p(1/n^{\gamma}) \implies E(h_n -c) = o(1/n^{\gamma}).$$

$\endgroup$
24
  • $\begingroup$ This doesn't correspond to what I asked though. I am asking about the effect on $\text{Cov}[A|\mathbf{X}] = \text{Cov}[\hat \beta\mathbf{X}]$ of replacing $(\mathbf{X}\mathbf{X}/n)^{-1}$ with its probability limit $Q^{-1}$. My questions asks is such an approximation valid? What level of error is incurred, and can we account for the error? $\endgroup$ Commented Dec 4, 2020 at 16:56
  • $\begingroup$ I am only interested in the standard regression case when $E[\varepsilon|\mathbf{X}] = 0$. Your post doesn't mention anything about the probability limit $Q^{-1}$ which is they key point of my question. $\endgroup$ Commented Dec 4, 2020 at 19:23
  • 1
    $\begingroup$ @sonicboom That's Bruce Hansen's econometrics textbook. Let me have a look. $\endgroup$ Commented Dec 10, 2020 at 17:21
  • 1
    $\begingroup$ I have updated my answer to show what we need to apply Hansen approach in your case. I suggest you delete all these comments, the essence has by now been incorporated in my post. I am deleting my comments. $\endgroup$ Commented Dec 10, 2020 at 18:45
  • 1
    $\begingroup$ @sonicbom No, that definitely won't work. But If you write explicitly the $X'X$ matrix, it is comprised of sample means. Moreover, if I guess in the first column of $X$ you have a constant, then you can write $X$ and $X'X$ in blocks, and apply block-matrix inversion to obtain an explicit expression for the inverse. You will find that it includes sample means and so multiplied by $\sqrt{n}$ should lead to a distribution. This means that here too you end up having the Hansen result, namely you have room to improve the rate of convergence from $o_p(1)$ up to $o_p(1/n^{\delta}),\; \delta <1/2.$ $\endgroup$ Commented Dec 10, 2020 at 21:40

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.