library(georob)
library(mapview)
data(meuse)
coordinates(meuse) <- ~x+y
proj4string(meuse) <- CRS("+init=epsg:28992")
mapview(meuse, zcol = "zinc")
Spatial Statistics
Session 4
Download slides PDF
The PDF contains what was shown during the lecture. The present website also includes in addition references in the slide notes (text in light gray).
Overview this session
- 2 special cases to expand the geostatistical model
- lognormal kriging
- anisotropy
- Model assessment
- Strategies
- Metrics
- Outlook on other techniques
1 Geostatistical model:
2 special cases
pro memoria: Model for Gaussian spatial data
Model for data: \(Y_{i} = S(\mathbf{x}_{i}) + Z_{i} = \mu(\mathbf{x}_{i}) + E(\mathbf{x}_{i}) + Z_{i}\)
with \(Y_{i}\): \(i^{\text{th}}\) datum; \(S(\mathbf{x}_{i})\): “signal” (true quantity) at location \(\mathbf{x}_{i}\); \(\mu(\mathbf{x}_{i})\): trend
\(\{E(\mathbf{x}_{i})\}\): a zero-mean Gaussian process, parametrized by covariance function \(\gamma(\mathbf{h}; \mathbf{\theta})\) or variogram \(V(\mathbf{h}; \mathbf{\theta})\)
\(Z_{i}\): iid Gaussian measurement error with variance \(\tau^2\)
Trend \(\mu(\mathbf{x}_{i})\) modeled by linear regression model with spatial covariates \(d_{k}(\mathbf{x}_{i})\)
\[ \mu(\mathbf{x}_{i}) = \sum_{k} d_{k}(\mathbf{x}_{i}) \beta_{k} = \mathbf{d}(\mathbf{x}_{i})^{T} \mathbf{\beta} \]
1.1 Lognormal universal kriging
Gaussian model fitted to log-transformed response variable \(Y(\mathbf{x}) = \log\big(U(\mathbf{x})\big)\)
(e.g. often pollution datasets)
⇒ Computing UK predictions for log-transformed response
⇒ How should we back-transform to original scale of response?
Lognormal distribution \[ Y = \log(U) \sim {\cal N}(\mu_{Y},\sigma_{Y}^2) \]
Expectation and variance of \(U\) \[ \begin{eqnarray*} \mathbb{E}[U] & = & \mu_{U} = \exp(\mu_{Y} + 0.5 \, \sigma_{Y}^2)\\ Var[U] & = & \mu_{U}^2 (\exp(\sigma_{Y}^2) -1 ) \end{eqnarray*} \]
Backtransformation for lognormal universal kriging
\(\exp\big(\hat{S}_{\mathrm{k}}(\mathbf{x}^\prime)\big)\) is a biased predictor of \(U(\mathbf{x}^\prime)\)
Unbiased back-transformation \[
\hat{U}_{\mathrm{lk}}(\mathbf{x}^\prime) = \exp
\left(\hat{S}_{\mathrm{k}}(\mathbf{x}^\prime)+0.5\left\{Var[S(\mathbf{x}^\prime)]
- Var[\hat{S}_{\mathrm{k}}(\mathbf{x}^\prime)]\right\}\right)
\]
Limits of prediction intervals can be back-transformed directly by \(\exp()\).
Back-transformation implemented in function lgnpp
of R package georob
.
Example: Lognormal UK Meuse zinc
data
Example Meuse zn
: Response distribution
par(mfrow=c(1,2))
hist(meuse@data$zinc, breaks = 20)
hist(log(meuse@data$zinc), breaks = 20)
Example Meuse zn
: OLS residuals
par(mfrow = c(1,2))
plot(lm(zinc ~ dist, meuse), 2)
plot(lm(log(zinc) ~ dist, meuse), 2)
Example Meuse zn
: REML fit
<- georob(log(zinc)~sqrt(dist), meuse, locations=~x+y,
r.logzn variogram.model="RMexp",
param=c(variance=0.15, nugget=0.05, scale=200),
tuning.psi=1000, control=control.georob(
cov.bhat=TRUE, cov.bhat.betahat=TRUE))
r.logzn
Tuning constant: 1000
Fixed effects coefficients:
(Intercept) sqrt(dist)
6.985 -2.567
Variogram: RMexp
variance snugget(fixed) nugget scale
0.14910 0.00000 0.04867 192.52854
Example Meuse zn
: REML variogram
plot(r.logzn, lag.dist.def=60, max.lag=1500)
Example Meuse zn
: Universal kriging prediction
data(meuse.grid)
coordinates(meuse.grid) <- ~x+y
gridded(meuse.grid) <- TRUE
<- predict(r.logzn, newdata=meuse.grid,
r.luk control=control.predict.georob(extended.output=TRUE))
str(r.luk@data)
'data.frame': 3103 obs. of 8 variables:
$ pred : num 7.03 7.05 6.75 6.49 7.07 ...
$ se : num 0.362 0.335 0.342 0.349 0.288 ...
$ lower : num 6.32 6.39 6.08 5.8 6.5 ...
$ upper : num 7.73 7.7 7.42 7.17 7.63 ...
$ trend : num 6.99 6.99 6.7 6.45 6.99 ...
$ var.pred : num 0.0388 0.0539 0.0437 0.0357 0.0787 ...
$ cov.pred.target: num 0.0285 0.0453 0.0379 0.0315 0.0724 ...
$ var.target : num 0.149 0.149 0.149 0.149 0.149 ...
- attr(*, "variogram.object")=List of 1
..$ :List of 9
.. ..$ variogram.model: chr "RMexp"
.. ..$ param : Named num [1:4] 0.1491 0 0.0487 192.5285
.. .. ..- attr(*, "names")= chr [1:4] "variance" "snugget" "nugget" "scale"
.. ..$ fit.param : Named logi [1:4] TRUE FALSE TRUE TRUE
.. .. ..- attr(*, "names")= chr [1:4] "variance" "snugget" "nugget" "scale"
.. ..$ isotropic : logi TRUE
.. ..$ aniso : Named num [1:5] 1 1 90 90 0
.. .. ..- attr(*, "names")= chr [1:5] "f1" "f2" "omega" "phi" ...
.. ..$ fit.aniso : Named logi [1:5] FALSE FALSE FALSE FALSE FALSE
.. .. ..- attr(*, "names")= chr [1:5] "f1" "f2" "omega" "phi" ...
.. ..$ sincos :List of 6
.. .. ..$ co: num 6.12e-17
.. .. ..$ so: num 1
.. .. ..$ cp: num 6.12e-17
.. .. ..$ sp: num 1
.. .. ..$ cz: num 1
.. .. ..$ sz: num 0
.. ..$ rotmat : num [1:2, 1:2] 1.00 -6.12e-17 6.12e-17 1.00
.. ..$ sclmat : Named num [1:2] 1 1
.. .. ..- attr(*, "names")= chr [1:2] "" "f1"
- attr(*, "psi.func")= chr "logistic"
- attr(*, "tuning.psi")= num 1000
- attr(*, "type")= chr "signal"
Example Meuse zn
: Unbiased backtransformation
<- lgnpp(r.luk)
r.luk str(r.luk@data)
'data.frame': 3103 obs. of 12 variables:
$ pred : num 7.03 7.05 6.75 6.49 7.07 ...
$ se : num 0.362 0.335 0.342 0.349 0.288 ...
$ lower : num 6.32 6.39 6.08 5.8 6.5 ...
$ upper : num 7.73 7.7 7.42 7.17 7.63 ...
$ trend : num 6.99 6.99 6.7 6.45 6.99 ...
$ var.pred : num 0.0388 0.0539 0.0437 0.0357 0.0787 ...
$ cov.pred.target: num 0.0285 0.0453 0.0379 0.0315 0.0724 ...
$ var.target : num 0.149 0.149 0.149 0.149 0.149 ...
$ lgn.pred : num 1189 1204 902 694 1215 ...
$ lgn.se : num 440 409 314 249 354 ...
$ lgn.lower : num 554 595 438 331 667 ...
$ lgn.upper : num 2286 2215 1673 1299 2064 ...
- attr(*, "variogram.object")=List of 1
..$ :List of 9
.. ..$ variogram.model: chr "RMexp"
.. ..$ param : Named num [1:4] 0.1491 0 0.0487 192.5285
.. .. ..- attr(*, "names")= chr [1:4] "variance" "snugget" "nugget" "scale"
.. ..$ fit.param : Named logi [1:4] TRUE FALSE TRUE TRUE
.. .. ..- attr(*, "names")= chr [1:4] "variance" "snugget" "nugget" "scale"
.. ..$ isotropic : logi TRUE
.. ..$ aniso : Named num [1:5] 1 1 90 90 0
.. .. ..- attr(*, "names")= chr [1:5] "f1" "f2" "omega" "phi" ...
.. ..$ fit.aniso : Named logi [1:5] FALSE FALSE FALSE FALSE FALSE
.. .. ..- attr(*, "names")= chr [1:5] "f1" "f2" "omega" "phi" ...
.. ..$ sincos :List of 6
.. .. ..$ co: num 6.12e-17
.. .. ..$ so: num 1
.. .. ..$ cp: num 6.12e-17
.. .. ..$ sp: num 1
.. .. ..$ cz: num 1
.. .. ..$ sz: num 0
.. ..$ rotmat : num [1:2, 1:2] 1.00 -6.12e-17 6.12e-17 1.00
.. ..$ sclmat : Named num [1:2] 1 1
.. .. ..- attr(*, "names")= chr [1:2] "" "f1"
- attr(*, "type")= chr "signal"
Example Meuse zn
: Predicted maps
library(terra)
<- rast(r.luk)
r.luk par(mfrow=c(1,2))
plot(r.luk["^pred$"], main = "UK prediction log(zn)")
plot(r.luk["lgn.pred"], main = "LUK prediction zn")
Example Meuse zn
: Predicted standard errors
par(mfrow=c(1,2))
plot(r.luk["^se$"], main = "UK standard error log(zn)")
plot(r.luk["lgn.se"], main = "LUK stanard error zn")
1.2 Anisotropic variograms
Spatial auto-correlation can vary from one direction to the other, i.e. it can be anisotropic.
In many instances the anisotropy is such that it could be made isotropic by a simple linear transformation of the spatial coordinates. ⇒ geometrically anisotropic variogram function
Idea: rotate and stretch/shrink components of \(\mathbf{x}\) such that the stochastic process is isotropic in the transformed coordinate system
\[ V(h^*) = V\left( \sqrt{(\mathbf{h}\mathbf{A})^\mathrm{T}\mathbf{A}\mathbf{h}} \right) \]
Iso-semivariances contours are ellipsoids in space of non-transformed coordinates and are mapped by transformation:
\[ \mathbf{A} = \left[\begin{array}{cc} 1/\alpha & 0\\ 0 & 1/(f_{1}\alpha) \end{array}\right] \left[\begin{array}{cc} \cos(\omega) & \sin(\omega) \\ -\sin(\omega) & \cos(\omega) \end{array}\right] \]
Example: elevation data
data(elevation, package="georob")
<- elevation
d.elevation $res <- residuals(lm(height~y, d.elevation))
d.elevationplot(y~x, d.elevation, cex=sqrt(abs(res)), asp=1,
col=c("blue", NA, "orange")[sign(res)+2])
Example: elevation data - directional sample variogram
<- sample.variogram(d.elevation$res,
r.sv locations=as.matrix(d.elevation[, c("x","y")]),
lag.dist.def=0.5, xy.angle.def=c(0, 45, 135, 180))
summary(r.sv)
Sample variogram estimator: qn
Summary of lag distances
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3812 2.2244 4.2278 4.1295 6.1756 8.2759
Summary of number of pairs per lag and distance classes
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 11.00 45.00 40.18 63.00 73.00
Angle classes in xy-plane: (-45,45] (45,135]
Angle classes in xz-plane: [0,180]
Example: elevation data - directional sample variogram
plot(r.sv)
Example: elevation data - fitted directional variogram
<- fit.variogram.model(r.sv,
r.exp variogram.model="RMexp",
param=c(variance=1500, nugget=10, scale=4),
aniso = default.aniso(f1 = 0.4, omega = 0),
fit.aniso = default.fit.aniso(f1 = TRUE, omega = TRUE))
r.exp
Variogram: RMexp
variance snugget(fixed) nugget scale
1.820e+03 0.000e+00 1.289e-07 1.256e+02
f1 f2(fixed) omega phi(fixed)
0.006757 1.000000 17.415945 90.000000
zeta(fixed)
0.000000
Example: elevation data - fitted directional variogram
plot(r.sv)
lines(r.exp, xy.angle=c(0, 90))
2 Model complexity
2.1 Bias-Variance trade-off illustration
Bias-Variance trade-off
Expected validation MSE for a given value \(x_0\) can be decomposed into variance, squared bias and variance of the error \(\epsilon\) (unexplained variation):
\[ E(y_0 - \hat{f}(x_0))^2 = Var(\hat{f}(x_0)) + [Bias(\hat{f}(x_0))]^2 + Var(\epsilon) \]
Variance \(Var(\hat{f}(x_0))\)
How much \(\hat{f}\) would change if we estimated it using a different data set. High variance: small changes in the data result in large changes in \(\hat{f}\).
Bias \(Bias(\hat{f}(x_0))\)
Error that is introduced by approximating a real-world problem, which may be extremely complicated, by a much simpler model. E.g. linear regression assumes linearity which is unlikely to be the true model.
3 Model assessment
3.1 Goal of model assessment
Data analysis often leads to a set of equally plausible candidate models that use different sets of covariates and/or different variograms.
Summary statistics (like RMSE, R2, kappa) to characterize:
How well does a model fit the data?
Compare models: What is the best model structure, what are relevant covariates? Model selection, variogram functions, tuning parameters.
Evaluate generalization properties: What is the average error to predict for a location without observed data?
Models with goodness-of-fit metrics often allow model building, but:
Goodness-of-fit not necessarily good criterion for judging quality of predictions.
Covariate selection by
step()
does not lead to a unique set of covariates but depends on the search strategy and on criterion (AIC/BIC) used for model section.Likelihood ratio tests cannot be used to compare the goodness-of-fit of models that have different variograms.
Therefore, independent validation or data splitting are required. Machine learning models often directly rely on such techniques as goodness-of-fit criteria do not exist or are not meaningful.
3.2 Model assessment strategies
How to obtain data pairs of observed and predicted?
Overview strategies
Completely independent sampling
Splitting of available data
Cross-validation
Non-parametric boostrap
Note: In the classical statistical context an independent data set with which the model performance is evaluated is often called validation set, in machine learning context it is called a test set. The validation set in machine learning is an additional subset of the data, especially often used with to neuronal networks, that serves to find optimal tuning parameters.
Completely independent sampling
Desired option
Independent data:
Survey from another institution/database (but, needs to be the same target population and survey method)
New survey:
Independently surveyed from data used for model calibration
Approach to select locations (and depth if 3D): design-based sampling, e.g. stratified random sampling.
Problems
Often no funding for additional survey
Not all studies allow for independent sampling, because it is not meaningful. e.g. landslide occurence is observed whenever it happens.
Not needed, if calibration locations were chosen by a design-based sampling approach, that allows unbiased estimate of model performance metrics, we may use data splitting.
Data splitting
Rule of thumb: 20 % for validation, or a minimal number that allows to compute reliable metrics/means, e.g. 30–50.
How to split:
simple random sample
weighted or stratified random sample, if data does not represent the target population well
Example for weighted splitting
Weighted random sample
Use occurrence probabilities of target population to obtain a validation set with distribution of target population.
Example for weighted splitting by spatial distribution
Problem: Data from pooled surveys, some strongly spatially clustered. Goal: Prediction for complete forested area of Canton of Zurich
→ Data splitting with weights to obtain homogenouous cover of total study area.
(Weights computed from Thyssen-polygons intersected with the forested area.)
Data splitting – problems
No proper independent data
Sub-optimal variant of new survey
Largely depends how splitting is done
Especially if splitting is done randomly: better to repeat
k-fold cross-validation
Implementation of k-fold cross-validation
Recommended \(k = 10\) or \(k = 5\) for time-consuming model fits
\(k = n\) is possible (leave-one-out cross-validation), but not recommended due to spatial auto-correlation
Splitting by spatial clusters/blocks possible, e.g. to evaluate spatial extrapolation capacity (
cv(.., method = "block")
in R packagegeorob
offers this option). But, not recommended in general because metrics become too pessimistic.To allow for comparison of different models cross-validation sets have to be kept fixed.
Repeated cross-validation with repeated splitting of data: Allows to obtain variation of metrics.
Combine with data-splitting: 20 % for validation, 80 % for model selection with cross-validation.
k-fold cross-validation – Problems
Random splitting: same problem as with data splitting. If available data does not well represent target area/population, validation results might be biased.
If cross-validation has been used for model selection, the final cross-validation results do not yield independent model performance metrics anymore.
Solution: nested cross-validation, inner loop for model selection, outer loop for model evaluation.
Example of wrong use of cross-validation:
Pre-selection of covariates based on correlation coefficient (using all data)
Create \(k\) subsets, fit models by leaving each \(k\) subsets out in turn
Compute predictions for left out subsets, compute error metrics
Non-parametric bootstrap
Problems:
Random splitting of biased data: same problem as for data-splitting or cross-validation.
Larger number of models needed.
Potential: simulate standard errors or prediction intervals for models that do not provide these.
Bootstrap corresponds to out-of-bag predictions in random forest.
3.3 Model assessment metrics and displays
Interval scaled response – validation scatterplot
Interval scaled response – metrics
To summarized the validation scatterplot, it is recommended to report at least 3 metrics:
a bias metric, indicating marginal bias
an overall precision statistic, reporting the standard deviation of the errors in the original unit of the response (bias and random variation)
a standardized error statistic, comparable across datasets.
Often reported are mean error (ME) or bias, root mean squared error (RMSE) and R2.
Interval scaled response – metrics Bias and RMSE
With \(y_{i}\) being the observed value at location \(i\) and \(\hat{Y}(\mathbf{x}_{i})\) the predicted map at location \(i\).
Bias (positive: over-prediction, negative: under-prediction):
\[ \mathrm{BIAS} = \frac{1}{n}\sum_{i=1}^n \left( \hat{Y}(\mathbf{x}_{i})-y_{i}\right) \]
Root mean square error:
\[ \mathrm{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^n \left( \hat{Y}(\mathbf{x}_{i})-y_{i}\right)^2} \]
Interval scaled response – metrics R2
R2 (also model efficiency coefficient or mean squared error skill score) with \(\overline{y}\) being the mean of the observed data
\[ \mathrm{R^2} = 1 - \frac{\sum_{i=1}^n \left( y_{i}-\hat{Y}(\mathbf{x}_{i})\right)^2}{\sum_{i=1}^n \left( y_i - \overline{y} \right)^2} \]
1: perfect prediction.
0: mean squared error is the same as variance of observed data, i.e. a map showing the mean of the data would have the same average prediction performance.
<0: mean squared error is larger than variance of observed data.
Interval scaled response – R2 vs. r2
\(R^2 \ne r^2\).
R2 does not correspond to Pearson correlation coefficient in presence of bias.
Binary responses – confusion matrix
Binary responses – confusion matrix
Hedging / “gaming” the score: Uninformative predictions yield good scores. Alternative: Skill scores for confusion matrices, e.g.:
HSS: Heidke skill score (also kappa)
PSS: Pierce skill score
Binary responses – Pierce skill score
Pierce skill score (simplified notation):
\[ PSS = \frac{ad - bc}{(a+c)(b+d)} \]
Reference: random predictions that are constrained to be unbiased.
Score range \([-1,1]\):
1: perfect prediction
0: prediction equal to random class assignment
-1: prediction always opposite of observed class
Multinomial responses – extension of binary case
Use score to optimally discretize predictions
4 Outlook on more spatial statistics topics
4.1 Spatial sampling desings
Model-based sampling for calibration: Optimal selection of sampling location depends on modelling approach used.
Design-based sampling for validation: random sampling design.
From left to right: simple random sample (a), optimized sample for mapping with linear regression model (b), optimized sample for kriging with an external drift (c), and stratified sample using sixteen equal-sized covariate strata (d). All samples are plotted in a map of the covariate (Brus, 2018).
Brus, D. J. Sampling for Digital Soil Mapping: A Tutorial Supported by R Scripts. Geoderma, 2018.
Brus, D.J. Spatial Sampling with R, 2022, https://dickbrus.github.io/SpatialSamplingwithR
4.2 Geostatistical analysis of experimental data
Example: Field experiment to compare the yield of 56 varieties of wheat in a randomized block experiment.
Geostatistical analysis of experimental data
Field experiments in ecology, agriculture, forestry, often give rise to spatial data
Classical analysis of variance of spatial experimental data ignores spatial structure of the data
Blocking and randomisation sometimes not effective to account for natural heterogeneity within experimental site
Residuals often violate independence assumption of classical analysis of variance methods
Explicit consideration of auto-correlation by generalized least squares estimation.
More information on wheat example, p. 260ff in Pinheiro, J. C. and Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
4.3 Block kriging
Mean and standard error estimates for finite support targets, e.g. parcels, municipalities.
Example: Soil organic carbon stocks, mapped with robust external-drift kriging for Swiss forests.
Block kriging SOC stocks
Basis: Punctual kriging and fitted (residual) variogram.
Mean per unit: aproximated by mean of punctual kriging predictions.
Standard errors: “correct” for spatial covariance structure modeled by variogram. I.e. the larger the spatial auto-correlation, the smaller the standard errors for a finite target.
More information in: Oliver + Webster, 2015. / Nussbaum, M., Papritz, A., et al. (2014). Estimating soil organic carbon stocks of Swiss forest soils by robust external-drift kriging. GMD.
4.4 Space-time data
Temporal change of soil water storage in a forest. Measurement of soil water storage biweekly during growing season for 2 years (Jost et al., 2005)
Jost, G., Heuvelink, G. B. M., and Papritz, A. (2005). Analysing thespace-time distribution of soil water storage of a forest ecosystem using spatio-temporal kriging. Geoderma, 128(3–4), 258–273.
Wikle, C. K., Zammit-Mangion, A., and Cressie, N. (2019), Spatio-Temporal Statistics with R, Boca Raton, FL: Chapman & Hall/CRC https://spacetimewithr.org.
Space-time sample variogram
Space-time sample variogram
Product-sum covariance model fitted to space-time sample variogram.