I was taught the autocorrelation in a time-series at lag $k$ is the correlation between all pairs of values separated by this lag.
Suppose I want to give it a go and calculate it manually for lag 1.
Simulate some white noise
set.seed(123)
TS <- ts(rnorm(1e3))
Then calculate autocorrelation manually and using the built-in function.
calc_autocorrelation_manually <- function(x) { cor(x[-1], x[-length(x)]) }
> calc_autocorrelation_manually(TS)
[1] -0.02741628
>
> acf(TS, lag = 1, plot = F)
Autocorrelations of series ‘TS’, by lag
0 1
1.000 -0.027
Both give the same or nearly the same result (I am aware the calculation in acf() is not strictly the same, as it doesn't use bias adjustment. Still, I think this shouldn't make a substantial difference).
However, if I do this for a simulated AR(1) process, the results aren't the same! E.g. using 'ideal' AR(1) with no noise:
> TS <- ts(99999) # initial value
> for (i in 1:350) { TS <- c(TS, TS[i] * 0.3) }
>
> calc_autocorrelation_manually(TS)
[1] 1
>
> acf(TS, lag = 1, plot = F)
Autocorrelations of series ‘TS’, by lag
0 1
1.0 0.3
I think the manual calculation of autocorrelation is incorrect, but then what is the correct one, and why does it nonetheless work so well in the first example?
cor(x[-1], x[-length(x)])$\endgroup$var(TS[-1]);var(TS[-length(TS)])) so that the covariance divided by products of s.d.s yields one. acf, in turn, uses that under stationarity, we may as well divide by a single overall variance. $\endgroup$TS, due to its huge starting value far away from the expected value of the series as well as lack of noise canceling all variation, behaves all but in a stationary way. $\endgroup$