1
$\begingroup$

I have a gridded dataset indexed by time and space, represented as a $m \times n$ array. I'm following along with Eq. 10 in this paper to partition the variance in this data over space and time. Specifically, they partition the total variance $\sigma^2_g$ into the average temporal variance over all regions ($\bar{\sigma^2_t}$) and the average spatial variance over all time points ($\bar{\sigma^2_s}$):

$$ \sigma^2_g=\frac{n(m-1)}{(m \times n) -1}\bar{\sigma^2_t}+\frac{m(n-1)}{(m \times n) - 1}\bar{\sigma^2_s} $$

My hangup is that I have a considerable amount of missing data, so I know I have to account for different sample sizes. $m = 194540$ and $n = 25$ for my data, so the coefficients in each term are near one. This implies that the sum of the spatial and temporal variances I calculate should be close to the global variance.

My current approach is to calculate the variance for each slice in time/space, and then calculate their weighted average based on the number of valid observations. As implemented in numpy:

# data.shape == (25, 194540)
# first axis is time, second axis is spatial position

total_var = np.nanvar(data)

var_over_time  = np.nanvar(data, axis=0)
samples_per_px = np.sum(~np.isnan(data), axis=0)
sigma_t        = np.average(var_over_time, weights=samples_per_px)

var_over_space = np.nanvar(data, axis=1)
samples_per_t  = np.sum(~np.isnan(data), axis=1)
sigma_s        = np.average(var_over_space, weights=samples_per_t)

print(f"Total variance: {total_var:.2f}")
print(f"Temporal variance: {sigma_t:.2f}")
print(f"Spatial variance: {sigma_s:.2f}")

This gives me

Total variance: 4.55
Temporal variance: 3.93
Spatial variance: 4.50

But this is inconsistent with my thinking the the sum of the spatial and temporal variances should equal the total variance. Next I tried repeating the calculation but with all missing values replaced with zero. This still gives me a variance sum larger than the total variance, which makes me think I could be approaching this question incorrectly.

So, two questions:

  • Does it make sense to use this partitioning procedure for data with missing values?
  • Can variances "overlap" in the sense that we cannot attribute variance to time or space alone?

Thanks for your help!

$\endgroup$
4
  • $\begingroup$ Could you tell us why your data are missing? What mechanisms of missingness do you have in mind? $\endgroup$ Commented Oct 23 at 13:30
  • $\begingroup$ The data come from aerial surveys. A missing value means that the location was unsurveyed in that year's survey. The surveys (are supposed to) inventory new appearances of forest die off relative to previous surveys. However, these maps are made by human observers so there's considerable noise in the data. $\endgroup$ Commented Oct 23 at 23:53
  • $\begingroup$ But why would a location be unsurveyed? It sounds like the fact of a missing survey might be an indicator that surveyors suspected no die-off, which would raise flags concerning the representativeness of the data. In short, the data might have a built in bias that you need to account for. That's why this is a key issue to be considered before deciding how to analyze the data. $\endgroup$ Commented Oct 24 at 11:57
  • 1
    $\begingroup$ There's a few reasons why a survey might not happen. I'm not involved in doing them myself but I can assume that it comes down to logistical constraints and observers' judgement about where die-off is most likely (the goal of the program is to inventory die-off, not to capture a representative snapshot of forest condition in any one year). So, there definitely is built-in bias. I'm curious whether partitioning variance on this dataset would give me similar results as, say, a satellite-derived dataset that is less dependent on human judgement. $\endgroup$ Commented Oct 24 at 21:12

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.