Yes, in general this should be fine. You are doing something like kriging with a trend component (Wikipedia), but by a more general (but computationally intensive) approach. (There are a variety of slightly different approaches to this kind of kriging, see e.g. https://stackoverflow.com/questions/73326418/different-results-between-kriging-on-residuals-and-universal-kriging-r .)
You're correct that in general the fixed effects (lat/long) model captures large-scale variation while the Matérn term captures smaller-scale spatial variation. In some limits (e.g. where the correlation term is nearly non-stationary, i.e. very long-range, or where you choose to use a complicated model for the spatial trend [higher-order polynomial or GAM]), you could start running into trouble with joint unidentifiability of the two parts of the model.
A simulated example showing that, at least in a nice clean case, we can fit such a model to simulated data and get reasonable results:
dd <- expand.grid(x = 1:20, y = 1:20, sex = factor(c("M", "F")), date = 1:5, rep = 1:3)
dd$dummy <- 1
library(glmmTMB)
dd$pos <- numFactor(dd$x, dd$y)
form <- ~ sex + x*y*date + mat(pos + 0 | dummy)
dd$z <- simulate_new(form,
newdata = dd,
newparam = list(beta = rep(1, 9), betadisp = -2,
theta = rep(1,3)),
family = gaussian)[[1]]
library(spaMM)
fit <- fitme(
z ~ sex + x*y*date + Matern(1 | x + y),
data = dd,
family = gaussian()
)
fit
formula: z ~ sex + x * y * date + Matern(1 | x + y)
ML: Estimation of corrPars, lambda and phi by ML.
Estimation of fixed effects by ML.
Estimation of lambda and phi by 'outer' ML, maximizing logL.
family: gaussian( link = identity )
------------ Fixed effects (beta) ------------
Estimate Cond. SE t-value
(Intercept) -1.2585 1.565e+00 -0.8039
sexM 1.0037 2.476e-03 405.4226
x 1.2362 1.145e-01 10.7921
y 0.9694 1.145e-01 8.4628
date 1.0019 3.777e-03 265.2242
x:y 0.9917 8.447e-03 117.4061
x:date 1.0003 3.153e-04 3172.0961
y:date 0.9999 3.153e-04 3170.9821
x:y:date 1.0000 2.632e-05 37987.7808
--------------- Random effects ---------------
Family: gaussian( link = identity )
--- Correlation parameters:
1.nu 1.rho
3.3501611 0.5751965
--- Variance parameters ('lambda'):
lambda = var(u) for u ~ Gaussian;
x + y : 3.329
# of obs: 12000; # of groups: x + y, 400
-------------- Residual variance ------------
phi estimate was 0.0183861
------------- Likelihood values -------------
logLik
logL (p_v(h)): 6439.292
ddata or from a public dataset. See stackoverflow.com/q/5963269 , [mcve], and stackoverflow.com/tags/r/info for discussions on the use ofdput(.),data.frame(.), or other methods to share sample data. Thanks! $\endgroup$d) in more detail. A nice feature might be to share the output ofhead(d, n=10)or so as text, you can usedputfor this. $\endgroup$