4
$\begingroup$

I’m modeling body mass in relation to differente variables, but I also need to account for spatial autocorrelation. My current approach uses spaMM with a Matérn correlation term based on coordinates.

fit <- fitme(
  log_bodymass ~ Sex +
    Latitude * Longitude * OrdinalDate +
    Matern(1 | X_km + Y_km),
  data = d,
  family = gaussian()
)

I want to interpret the effects of latitude, longitude, and season (OrdinalDate), but I’ve read that including Lat/Long as fixed effects and in the spatial term might cause redundancy.

Question:

Is it valid to include Lat/Long as fixed effects while also using a Matérn spatial term? Do they model different scales of variation (broad trend vs local clustering)?

Thanks

$\endgroup$
2
  • $\begingroup$ Welcome to SO, Léa Veine-Tonizzo! I think this question is asking about how to analyze something, which means it is more appropriate to ask this on Cross Validated instead of here on SE. I'm voting to migrate there (not sure if/when it will be done). Another issue until then is that it would be incredibly useful to have representative data, either some of your d data or from a public dataset. See stackoverflow.com/q/5963269 , [mcve], and stackoverflow.com/tags/r/info for discussions on the use of dput(.), data.frame(.), or other methods to share sample data. Thanks! $\endgroup$ Commented Oct 29 at 21:08
  • $\begingroup$ Nice question! You will get elaborated answers once we completed the migration. Although it is not required, it will help if you describe your data (d) in more detail. A nice feature might be to share the output of head(d, n=10) or so as text, you can use dput for this. $\endgroup$ Commented Oct 29 at 21:22

1 Answer 1

4
$\begingroup$

Yes, in general this should be fine. You are doing something like kriging with a trend component (Wikipedia), but by a more general (but computationally intensive) approach. (There are a variety of slightly different approaches to this kind of kriging, see e.g. https://stackoverflow.com/questions/73326418/different-results-between-kriging-on-residuals-and-universal-kriging-r .)

You're correct that in general the fixed effects (lat/long) model captures large-scale variation while the Matérn term captures smaller-scale spatial variation. In some limits (e.g. where the correlation term is nearly non-stationary, i.e. very long-range, or where you choose to use a complicated model for the spatial trend [higher-order polynomial or GAM]), you could start running into trouble with joint unidentifiability of the two parts of the model.

A simulated example showing that, at least in a nice clean case, we can fit such a model to simulated data and get reasonable results:

dd <- expand.grid(x = 1:20, y = 1:20, sex = factor(c("M", "F")), date  = 1:5, rep = 1:3)
dd$dummy <- 1
library(glmmTMB)
dd$pos <- numFactor(dd$x, dd$y)
form <- ~ sex + x*y*date + mat(pos + 0 | dummy)
dd$z <- simulate_new(form,
                     newdata = dd,
                     newparam = list(beta = rep(1, 9), betadisp = -2,
                                     theta = rep(1,3)),
                     family = gaussian)[[1]]

library(spaMM)
fit <- fitme(
  z ~ sex + x*y*date + Matern(1 | x + y),
  data = dd,
  family = gaussian()
)
 fit
formula: z ~ sex + x * y * date + Matern(1 | x + y)
ML: Estimation of corrPars, lambda and phi by ML.
    Estimation of fixed effects by ML.
Estimation of lambda and phi by 'outer' ML, maximizing logL.
family: gaussian( link = identity ) 
 ------------ Fixed effects (beta) ------------
            Estimate  Cond. SE    t-value
(Intercept)  -1.2585 1.565e+00    -0.8039
sexM          1.0037 2.476e-03   405.4226
x             1.2362 1.145e-01    10.7921
y             0.9694 1.145e-01     8.4628
date          1.0019 3.777e-03   265.2242
x:y           0.9917 8.447e-03   117.4061
x:date        1.0003 3.153e-04  3172.0961
y:date        0.9999 3.153e-04  3170.9821
x:y:date      1.0000 2.632e-05 37987.7808
 --------------- Random effects ---------------
Family: gaussian( link = identity ) 
                   --- Correlation parameters:
     1.nu     1.rho 
3.3501611 0.5751965 
           --- Variance parameters ('lambda'):
lambda = var(u) for u ~ Gaussian; 
   x + y  :  3.329  
# of obs: 12000; # of groups: x + y, 400 
 -------------- Residual variance  ------------
phi estimate was 0.0183861 
 ------------- Likelihood values  -------------
                       logLik
logL       (p_v(h)): 6439.292
$\endgroup$
3
  • $\begingroup$ Hello, thank you for your answer. This is very helpful, :) I am grateful for your time. $\endgroup$ Commented Oct 30 at 5:02
  • 2
    $\begingroup$ +1. But concerning "classically," I learned UK (c. late 1980's) in the context of higher-order spatial differences. One chose an order, attempted to fit the model, and checked the residuals with something like a chi-squared test (you wanted the standardized residuals to have a chi-squared value close to what would be expected). That was closer to a simultaneous estimation of the trend (in the form of a spatial difference) with the autocorrelation. Your description sounds more like what was called "residual kriging." $\endgroup$ Commented Oct 30 at 14:28
  • $\begingroup$ Edited. Better? $\endgroup$ Commented Oct 30 at 21:50

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.