3
$\begingroup$

I was taught the autocorrelation in a time-series at lag $k$ is the correlation between all pairs of values separated by this lag.

Suppose I want to give it a go and calculate it manually for lag 1.

Simulate some white noise

set.seed(123)
TS <- ts(rnorm(1e3))

Then calculate autocorrelation manually and using the built-in function.

calc_autocorrelation_manually <- function(x) { cor(x[-1], x[-length(x)]) }

> calc_autocorrelation_manually(TS)
[1] -0.02741628
> 
> acf(TS, lag = 1, plot = F)

Autocorrelations of series ‘TS’, by lag

     0      1 
 1.000 -0.027

Both give the same or nearly the same result (I am aware the calculation in acf() is not strictly the same, as it doesn't use bias adjustment. Still, I think this shouldn't make a substantial difference).

However, if I do this for a simulated AR(1) process, the results aren't the same! E.g. using 'ideal' AR(1) with no noise:

> TS <- ts(99999) # initial value
> for (i in 1:350) { TS <- c(TS, TS[i] * 0.3) }
> 
> calc_autocorrelation_manually(TS)
[1] 1
> 
> acf(TS, lag = 1, plot = F)

Autocorrelations of series ‘TS’, by lag

  0   1 
1.0 0.3 

I think the manual calculation of autocorrelation is incorrect, but then what is the correct one, and why does it nonetheless work so well in the first example?

$\endgroup$
6
  • $\begingroup$ That is a little hard to say since you do not tell us how the manual calculation works. Another candidate explanation is that your second deterministic series is effectively zero after a steps, so that there may be numerical inaccuracies in your manual formula. $\endgroup$ Commented Dec 3, 2024 at 15:35
  • $\begingroup$ @ChristophHanck apologies, this was missed off by mistake, now added cor(x[-1], x[-length(x)]) $\endgroup$ Commented Dec 3, 2024 at 16:19
  • $\begingroup$ I believe your question might be answered at stats.stackexchange.com/questions/81754/…: please take a look. $\endgroup$ Commented Dec 3, 2024 at 16:39
  • 1
    $\begingroup$ Indeed, the fact that your series drops so fast might then imply that the variances in the denominator differ in orders of magnutude (try var(TS[-1]);var(TS[-length(TS)])) so that the covariance divided by products of s.d.s yields one. acf, in turn, uses that under stationarity, we may as well divide by a single overall variance. $\endgroup$ Commented Dec 3, 2024 at 17:50
  • 1
    $\begingroup$ The link correctly mentions stationarity as a requirement for equivalence - now, your series TS, due to its huge starting value far away from the expected value of the series as well as lack of noise canceling all variation, behaves all but in a stationary way. $\endgroup$ Commented Dec 4, 2024 at 4:51

1 Answer 1

2
$\begingroup$

Next to the points about starting values raised in the comments, there is another issue related to the "manual" formula for a deterministic difference sequence of the type you use that, irrespective of the starting value, will give an autocorrelation of 1 when computed from the standard correlation formula:

Denote by $\bar{y}_{lag}$ the mean of the lagged series starting at $i=1$, and by $\bar{y}_{lead}$ the mean of the lead series starting at $i=2$. The correlation formula is

$$ corr=\frac{\sum_{i=2}^n(y_{i-1}-\bar{y}_{lag})(y_i-\bar{y}_{lead})}{\sqrt{\sum_{i=1}^{n-1}(y_i-\bar{y}_{lag})^2\sum_{i=2}^{n}(y_i-\bar{y}_{lead})^2}} $$ The difference sequence is such that $y_i=\rho y_{i-1}$ or, recursively, $y_i=\rho^{i-1}y_1$, where $y_1$ is the initial value in the series. Then, $$ \bar{y}_{lag}=\frac{1}{n-1}\sum_{i=1}^{n-1}y_i=\frac{y_1}{n-1}\sum_{i=0}^{n-2}\rho^i=y_1\frac{1-\rho^{n-1}}{(n-1)(1-\rho)}\equiv y_1\tilde M $$ Similarly, we may obtain $$ \bar{y}_{lead}=\rho y_1\tilde M $$ and the variations $$ \sum_{i=1}^{n-1}(y_i-\bar{y}_{lag})^2=y_1^2\sum_{i=1}^{n-1}(\rho^{i-1}-\tilde M)^2\equiv y_1^2D $$ and \begin{eqnarray*} \sum_{i=2}^{n}(y_i-\bar{y}_{lead})^2&=&\sum_{i=2}^{n}(\rho^{i-1}y_1-\rho y_1\tilde M)^2\\ &=&\rho^2y_1^2\sum_{i=2}^{n}(\rho^{i-2}-\tilde M)^2\\ &=&\rho^2y_1^2\sum_{i=1}^{n-1}(\rho^{i-1}-\tilde M)^2\\ &=&\rho^2y_1^2D \end{eqnarray*} For the numerator of the correlation, a similar calculation shows \begin{eqnarray*} \sum_{i=2}^n(y_{i-1}-\bar{y}_{lag})(y_i-\bar{y}_{lead}) &=&\sum_{i=2}^n(y_1\rho^{i-2}-y_1\tilde M)(y_1\rho^{i-1}-\rho y_1\tilde M)\\ &=&\rho y_1^2\sum_{i=2}^n(\rho^{i-2}-\tilde M)(\rho^{i-2}-\tilde M)\\ &=&\rho y_1^2\sum_{i=2}^n(\rho^{i-2}-\tilde M)^2\\ &=&\rho y_1^2\sum_{i=1}^{n-1}(\rho^{i-1}-\tilde M)^2=\rho y_1^2D\\ \end{eqnarray*} Thus, $$ corr=\frac{\rho y_1^2D}{\sqrt{\rho^2y_1^2Dy_1^2D}}=1 $$ The ACF, in turn, does not divide by the product of standard deviations, but, assuming stationarity, a common overall variance estimate. An analogous exercise will reveal that the cancellation then do not happen as with the standard correlation formula.

In particular, the full sample mean is now $$ \bar y=\frac{1}{n}\sum_{i=1}^ny_i=y_1\frac{1-\rho^n}{n(1-\rho)}\equiv y_1\breve M $$ such that the full variance computed around this full mean and the numerator in the correlation formula will not be proportional in the same way as the product of square roots of variations and numerator were. A bit more specifically, $$ \breve M=\tilde M+\frac{\rho^{n-1}}{n-1} $$ Also, in the variance formula, we now have $n$ squared deviations, such that $var(y)=y_1^2\cdot\breve D/(n-1)$, where $$ \breve D:=\sum_{i=1}^n(\rho^{i-1}-\breve M)^2 $$ Thus, the ratio of covariation and variance is $$ acf=\rho\frac{D}{\breve D}, $$ which is not equal to one, but rather close to, but also not equal to $\rho$. Note however that this expression is independent of the starting value $y_1$. (I haven't been able to further simplify the ratio, see also here.)

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.