\documentclass[a4paper,11pt]{book} \usepackage{SpatialExtremes} \usepackage[utf8]{inputenc} \begin{document} % \VignetteIndexEntry{A R Package for Modelling Spatial Extremes} % \VignetteDepends{SpatialExtremes} % \VignetteKeyword{Extreme Value Theory} % \VignetteKeyword{Spatial Extremes} % \VignetteKeyword{Max-stable processes} % \VignettePackage{SpatialExtremes} \pagestyle{empty} \frontmatter \begin{titlepage} \vspace*{2cm} \begin{center} \LARGE A User's Guide to the SpatialExtremes Package\\ \vspace{1em} \Large Mathieu Ribatet\\ \vspace{1em} Copyright \copyright{2009}\\ \vspace{2em} \large Chair of Statistics\\ École Polytechnique Fédérale de Lausanne\\ Switzerland\\ \end{center} \vfill \begin{figure}[b] \centering <>= library(SpatialExtremes) set.seed(12) n.site <- 50 coord <- matrix(runif(2*n.site, 0, 10), ncol = 2) ##Simulate a max-stable process - with unit Frechet margins data <- rmaxstab(40, coord, "whitmat", nugget = 0, range = 1, smooth = 2) ##Compute the lambda-madogram lmadogram(data, coord, n.bins = 25, border = "grey", box = FALSE) @ \end{figure} \vfill \end{titlepage} \mainmatter \pagenumbering{roman} \normalsize \tableofcontents %%\listoftables \listoffigures <>= options(SweaveHooks=list(fig=function() par(mar=c(5.1, 4.1, 1.1, 2.1)))) @ \pagestyle{fancy} \renewcommand{\chaptermark}[1]{\markboth{\thechapter\ $\cdot$\ #1}{}} \renewcommand{\sectionmark}[1]{\markright{\thesection\ $\cdot$\ #1}} %% To have completely blanck pages before the begining of a new %% chapter \makeatletter \def\cleardoublepage{\clearpage\if@twoside \ifodd\c@page\else \hbox{} \thispagestyle{empty} \newpage \if@twocolumn\hbox{}\newpage\fi\fi\fi} \makeatother \chapter*{Introduction} \label{cha:introduction} \addcontentsline{toc}{chapter}{Introduction} \pagenumbering{arabic} \section*{What is the SpatialExtremes package?} \label{sec:what-spat-pack} The \textbf{SpatialExtremes} package is an add-on package for the R \citep{Rsoft} statistical computing system. It provides functions for the analysis of spatial extremes using (currently) max-stable processes. All comments, criticisms and queries on the package or associated documentation are gratefully received. \section*{Obtaining the package/guide} \label{sec:obta-pack} The package can be downloaded from CRAN (The Comprehensive R Archive Network) at \url{http://cran.r-project.org/}. This guide (in pdf) will be in the directory \verb+SpatialExtremes/doc/+ underneath wherever the package is installed. You can get it by invoking <>= vignette("SpatialExtremesGuide") @ To have a quick overview of what the package does, you might want to have a look at its own web page \url{http://spatialextremes.r-forge.r-project.org/}. \section*{Contents} To help users to use properly the \emph{SpatialExtremes} packages, this report introduces all the theory and references needed. Some practical examples are inserted directly within the text to show how it works in practice. Chapter~\ref{cha:an-introduction-max} is an introduction to max-stable processes and introduces several models that might be useful in concrete applications. Chapter~\ref{cha:uncond-simul-rand} describes simulation techniques for unconditional simulation of both Gaussian and max-stable random fields. Statistics and tools to analyze the spatial dependence of extremes are presented in Chapter~\ref{cha:spatial-depend-max}. Chapter~\ref{chat:fit-maxstab-frech} tackles the problem of fitting max-stable process to data that are assumed to be unit Fréchet distributed. Chapter~\ref{cha:model-selection} is devoted to model selection. Chapter~\ref{cha:fit-maxstab-gev} presents models and procedures on how to fit max-stable processes to data that do not have unit Fréchet margins. Chapter~\ref{cha:model-checking} is devoted to model checking while Chapter~\ref{cha:infer-proc-pred} is devoted to inferential procedures and predictions. Lastly, Chapter~\ref{cha:conclusion} draws conclusions on spatial extremes. Note that several computations are reported in the Annex part. \section*{Caveat} I have checked these functions as best I can but they may contain bugs. If you find a bug or suspected bug in the code or the documentation please report it to me at \href{mailto:mathieu.ribatet@epfl.ch}{mathieu.ribatet@epfl.ch}. Please include an appropriate subject line. \section*{Legalese} This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details. A copy of the GNU General Public License can be obtained from \url{http://www.gnu.org/copyleft/gpl.html}. \section*{Acknowledgements} This work has been supported by the Competence Center Environment and Sustainability \url{http://www.cces.ethz.ch/index} within the EXTREMES project \url{http://www.cces.ethz.ch/projects/hazri/EXTREMES}. I would like to thank S. A. Padoan for his support when we were both calculating the tedious partial derivatives required for the estimation of the asymptotic sandwich covariance matrix. \chapter{An Introduction to Max-Stable Processes} \label{cha:an-introduction-max} A max-stable process\index{max-stable!process} $Z(\cdot)$ is the limit process of maxima of independent identically distributed random fields $Y_i(x)$, $x \in \mathbb{R}^d$. Namely, for suitable $a_n(x) > 0$ and $b_n(x) \in \mathbb{R}$, \begin{equation} \label{eq:maxstab-def} Z(x) = \lim_{n \rightarrow +\infty} \frac{\max_{i=1}^n Y_i(x) - b_n(x)}{a_n(x)}, \qquad x \in \mathbb{R}^d \end{equation} Note that \eqref{eq:maxstab-def} does not ensure that the limit exists. However, provided it does and from \eqref{eq:maxstab-def}, we can see that max-stable processes might be appropriate models for modelling annual maxima of spatial data, for example. Theoretically speaking, there is no loss of generality in transforming the margins to have a unit Fréchet\index{Fr\'echet!unit} scale i.e. \begin{equation} \label{eq:CDFFrechet} \Pr\left[Z(x) \leq z\right] = \exp\left(-\frac{1}{z} \right), \qquad \forall x \in \mathbb{R}^d, \qquad z > 0 \end{equation} and we will first assume that the unit Fréchet assumption holds for which we have $a_n(x) = n$ and $b_n(x) = 0$. Currently, there are two different characterisations of a max-stable process. The first one, often referred to as the \emph{rainfall-storm}\index{rainfall-storm process} model, was first introduced by \citet{Smith1991}. More recently, \citet{Schlather2002} introduced a new characterisation of a max-stable process allowing for a random shape. In this Chapter, we will present several max-stable models that might be relevant in studying spatial extremes. \section{The Smith Model} \label{sec:smiths-char} \citet{Haan1984} was the first to proposed a characterisation of max-stable processes. Later, \citet{Smith1991} use this characterisation to provide a parametric model for spatial extremes. The construction was the following. Let $\{(\xi_i, y_i), i \geq 1\}$ denote the points of a Poisson process\index{Poisson process} on $(0,+\infty) \times \mathbb{R}^d$ with intensity measure \index{intensity measure} $\xi^{-2} d\xi \nu(dy)$, where $\nu(dy)$ is a positive measure on $\mathbb{R}^d$. Then one characterisation of a max-stable process with unit Fréchet margins is \begin{equation} \label{eq:smithChar} Z(x) = \max_i \left\{ \xi_i f(y_i,x) \right\}, \qquad x \in \mathbb{R}^d \end{equation} where $\{f(y,x), x,y \in \mathbb{R}^d\}$ is a non-negative function such that \begin{equation*} \int_{\mathbb{R}^d} f(x,y) \nu(dy) = 1, \qquad \forall x \in \mathbb{R}^d \end{equation*} To see that equation~\eqref{eq:smithChar} defines a stationary max-stable process with unit Fréchet margins, we have to check that the margins are indeed unit Fréchet and $Z(x)$ has the max-stable property\index{max-stable!property}. Following Smith, consider the set defined by: \begin{equation*} E = \left\{(\xi, y) \in \mathbb{R}^+_* \times \mathbb{R}^d: \xi f(y,x) > z \right\} \end{equation*} for a fixed location $x \in \mathbb{R}^d$ and $z>0$. Then \begin{eqnarray*} \Pr\left[Z(x) \leq z \right] &=& \Pr \left[\text{no points in } E \right] = \exp \left[ -\int_{\mathbb{R}^d} \int_{z/f(y,x)}^{+\infty} \xi^{-2} d\xi \nu(\mbox{dy}) \right]\\ &=& \exp \left[ - \int_{\mathbb{R}^d} z^{-1} f(x,y) \nu(\mbox{dy}) \right] = \exp \left(- \frac{1}{z} \right) \end{eqnarray*} and the margins are unit Fréchet. The max-stable property of $Z(\cdot)$ follows because the superposition of $n$ independent, identical Poisson processes is a Poisson process with its intensity multiplied by $n$. More precisely, we have: \begin{equation*} \left\{\max_{i=1}^n Z_i(x_1), \ldots, \max_{i=1}^n Z_i(x_k) \right\} \stackrel{\cdot}{\sim} n \left\{Z(x_1), \ldots, Z(x_k)\right\}, \qquad k \in \mathbb{N}. \end{equation*} The process defined by \eqref{eq:smithChar} is often referred to as the rainfall-storm process\index{rainfall-storm process}, as one can have a more physical interpretation of the above construction. Think of $y_i$ as realisations of rainfall storm centres in $\mathbb{R}^d$ and $\nu(dy)$ as the spatial distribution of these storm centres over $\mathbb{R}^d$ - usually $d = 2$. Each $\xi_i$ represents the intensity of the $i$-th storm and therefore $\xi_i f(y_i, x)$ represents the amount of rainfall for this specific event at location $x$. In other words, $f(y_i, \cdot)$ drives how the $i$-th storm centred at $y_i$ diffuses in space. Definition \eqref{eq:smithChar} is rather general and Smith considered a particular setting where $\nu(dy)$ is the Lebesgue measure and $f(y,x) = f_0(y-x)$, where $f_0(y-x)$ is a multivariate normal density with zero mean and covariance matrix $\Sigma$\footnote{Another form of Smith's model that uses a Student distribution instead of the normal one. However, it is not currently implemented.}. With these additional assumptions, it can be shown that the bivariate CDF is \begin{equation} \label{eq:smith} \Pr[Z(x_1) \leq z_1, Z(x_2) \leq z_2] = \exp\left[-\frac{1}{z_1} \Phi \left(\frac{a}{2} + \frac{1}{a} \log \frac{z_2}{z_1} \right) - \frac{1}{z_2} \Phi \left(\frac{a}{2} + \frac{1}{a} \log\frac{z_1}{z_2} \right) \right] \end{equation} where $\Phi$ is the standard normal cumulative distribution function and, for two given locations 1 and 2 \begin{equation*} a^2 = \Delta x^T \Sigma^{-1} \Delta x, \qquad \Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12}\\ \sigma_{12} & \sigma_{22} \end{bmatrix} \quad \text{or} \quad \Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} & \sigma_{13}\\ \sigma_{12} & \sigma_{22} & \sigma_{23}\\ \sigma_{13} & \sigma_{23} & \sigma_{33} \end{bmatrix} \quad \text{and so forth} \end{equation*} where $\Delta x$ is the distance vector between location 1 and location 2. Figure~\ref{fig:Smith2sim} plots two simulations of Smith's model with different covariance matrices. \begin{proof} \begin{eqnarray*} -\log \Pr \left[Z(x_1) \leq z_1, Z(x_2) \leq z_2 \right] &=& \int_{\mathbb{R}^d}\int_{\min\{z_1/f_0(s - x_1), z_2 / f_0(s- x_2) \}}^{+\infty} \xi^{-2} \mbox{d$\xi$} \mbox{ds}\\ &=& \int \max\left\{\frac{f_0(s - x_1)}{z_1}, \frac{f_0(s- x_2)}{z_2}\right\} \mbox{ds}\\ &=& \int \frac{f_0(s - x_1)}{z_1} \mathbb{I}\left(\frac{f_0(s - x_1)}{z_1} > \frac{f_0(s - x_2)}{z_2} \right) \mbox{ds}\\ &+& \int \frac{f_0(s - x_2)}{z_2} \mathbb{I}\left(\frac{f_0(s - x_1)}{z_1} \leq \frac{f_0(s - x_2)}{z_2} \right) \mbox{ds}\\ &=& \int \frac{f_0(s)}{z_1} \mathbb{I}\left(\frac{f_0(s)}{z_1} > \frac{f_0(s - x_2 + x_1)}{z_2} \right) \mbox{ds}\\ &+& \int \frac{f_0(s)}{z_2} \mathbb{I}\left(\frac{f_0(s - x_1 + x_2)}{z_1} \leq \frac{f_0(s)}{z_2} \right) \mbox{ds}\\ &=& \frac{1}{z_1} \mathbb{E}\left[\mathbb{I}\left(\frac{f_0(X)}{z_1} > \frac{f_0(X - x_2 + x_1)}{z_2} \right) \right]\\ &+& \frac{1}{z_2} \mathbb{E}\left[ \mathbb{I}\left(\frac{f_0(X - x_1 + x_2)}{z_1} \leq \frac{f_0(X)}{z_2} \right)\right] \end{eqnarray*} where $X$ is a r.v.\ having $f_0$ as density. To get the closed form of the bivariate distribution, it remains to compute the probabilities of the event $\{f_0(X)/z_1 > f_0(X - x_2 + x_1)/z_2\}$. {\small \begin{eqnarray*} \frac{f_0(X)}{z_1} > \frac{f_0(X - x_2 + x_1)}{z_2} &\Longleftrightarrow& 2 \log z_1 + X^T \Sigma^{-1} X < 2 \log z_2 + \left(X - x_2 + x_1 \right)^T \Sigma^{-1} \left(X - x_2 + x_1 \right)\\ &\Longleftrightarrow& X^T \Sigma^{-1} (x_1 - x_2) > \log \frac{z_1}{z_2} - \frac{1}{2} (x_1 - x_2)^T \Sigma^{-1} (x_1 - x_2) \end{eqnarray*} } As $X$ has density $f_0$, $X^T \Sigma^{-1} (x_1 - x_2)$ is normal with mean $0$ and variance $a^2=(x_1 - x_2)^T \Sigma^{-1} (x_1 - x_2)$. And finally, we get \begin{eqnarray*} \frac{1}{z_1} \mathbb{E}\left[ \mathbb{I} \left( \frac{f_0(X)}{z_1} > \frac{f_0(X - x_2 + x_1)}{z_2} \right) \right] &=& \frac{1}{z_1} \left\{1 - \Phi\left(\frac{\log z_1/z_2}{a} - \frac{a}{2} \right) \right\}\\ &=& \frac{1}{z_1} \Phi\left(\frac{a}{2} + \frac{\log z_2/z_1}{a} \right)\\ \end{eqnarray*} and \begin{eqnarray*} \frac{1}{z_2} \mathbb{E}\left[ \mathbb{I} \left( \frac{ f_0(X - x_1 + x_2)}{z_1} \leq \frac{f_0(X)}{z_2} \right)\right] &=& \frac{1}{z_2} \Phi\left(\frac{a}{2} + \frac{\log z_1/z_2}{a} \right) \end{eqnarray*} This proves equation~\eqref{eq:smith}. \end{proof} <>= x <- seq(0, 10, length = 100) seed <- 19 set.seed(seed) data0 <- rmaxstab(1, cbind(x,x), "gauss", cov11 = 9/8, cov12 = 0, cov22 = 9/8, grid = TRUE) set.seed(seed) data1 <- rmaxstab(1, cbind(x,x), "gauss", cov11 = 9/8, cov12 = 1, cov22 = 9/8, grid = TRUE) png("Figures/Smith2Sim.png", width = 1400, height = 700) par(mfrow=c(1,2)) image(x, x, log(data0), col = terrain.colors(64)) image(x, x, log(data1), col = terrain.colors(64)) dev.off() @ \begin{figure} \centering \includegraphics[width=0.75\textwidth]{Figures/Smith2Sim} \caption{Two simulations of the Smith model with different $\Sigma$ matrices. Left panel: $\sigma_{11} = \sigma_{22} = 9/8$ and $\sigma_{12} = 0$. Right panel: $\sigma_{11} = \sigma_{22} = 9/8$ and $\sigma_{12} = 1$. The max-stable processes are transformed to unit Gumbel margins for viewing purposes.} \label{fig:Smith2sim} \end{figure} In equation~\eqref{eq:smith}, $a$ is the Mahanalobis distance\index{distance!Mahanalobis} and is similar to the Euclidean distance except that it gives different weights to each component of $\Delta x$. It is positive and the limiting cases $a \rightarrow 0^+$ and $a \rightarrow +\infty$ correspond respectively to perfect dependence and independence. Therefore, for $\Sigma$ fixed, the dependence decreases monotically and continuously as $||\Delta x ||$ increases. On the contrary, if $\Delta x$ is fixed, the dependence decreases monotically as $a$ increases. The covariance matrix $\Sigma$ plays a major role in equation~\eqref{eq:smith} as it determines the shape of the storm events. Indeed, due the use of a multivariate normal distribution, the storm events have an elliptical shape. Considering the eigen-decomposition\index{eigen-decomposition} of $\Sigma$, we can write \begin{equation} \label{eq:svdSigma} \Sigma = U \Lambda U^T, \end{equation} where $U$ is a rotation matrix and $\Lambda$ is a diagonal matrix of the eigenvalues\index{eigenvalues}. Thus, $U$ controls the direction of the principal axes and $\Lambda$ controls their lengths. If $\Sigma$ is diagonal and all the diagonal terms are identical, then $\Sigma = \Lambda$, so that the ellipsoids change to circles and model~\eqref{eq:smith} becomes isotropic. Figure~\ref{fig:Smith2sim} is a nice illustration of this. The left panel corresponds to an isotropic random field while the right one depicts a clear anisotropy for which we have \begin{equation*} \Sigma = \begin{bmatrix} 9/8 & 1\\ 1 & 9/8 \end{bmatrix} = \begin{bmatrix} \phantom{-}\cos (-3 \pi / 4) & \sin (-3 \pi / 4)\\ -\sin (-3 \pi / 4) & \cos (-3 \pi / 4) \end{bmatrix} \begin{bmatrix} 1/8 & 0\\ 0 & 17/8 \end{bmatrix} \begin{bmatrix} \cos (-3 \pi / 4) & -\sin (-3 \pi / 4)\\ \sin (-3 \pi / 4) & \phantom{-}\cos (-3 \pi / 4) \end{bmatrix}, \end{equation*} so that the main direction of the major principal axis is $\pi/4$ and a one unit move along the direction $-\pi / 4$ yields the same decrease in dependence as $17$ unit moves along the direction $\pi / 4$. \section{The Schlather Model} \label{sec:schl-char} More recently, \citet{Schlather2002} introduced a second characterisation of max-stable processes. Let $Y(\cdot)$ be a stationary process on $\mathbb{R}^d$ such that $\mathbb{E}[\max\{0, Y(x)\}] = 1$ and $\{\xi_i, i \geq 1\}$ be the points of a Poisson process on $\mathbb{R}^+_*$ with intensity measure \index{intensity measure} $\xi^{-2} d\xi$. Then Schlather shows that a stationary max-stable process with unit Fréchet margins can be defined by: \begin{equation} \label{eq:SchlatherChar} Z(x) = \max_i \xi_i \max \left\{0, Y_i(x) \right\} \end{equation} where the $Y_i(\cdot)$ are i.i.d copies of $Y(\cdot)$. As before, the max-stable property of $Z(\cdot)$ stems from the superposition of $n$ independent, identical Poisson processes, while the unit Fréchet margins holds by the same argument as for the Smith model. Indeed, let consider the following set: \begin{equation*} E = \left\{(\xi, y(x)) \in \mathbb{R}^+_* \times \mathbb{R}^d: \xi \max(0, y(x)) > z \right\} \end{equation*} for a fixed location $x \in \mathbb{R}^d$ and $z > 0$. Then \begin{eqnarray*} \Pr\left[Z(x) \leq z \right] &=& \Pr\left[\text{no points in } E \right] = \exp \left[ - \int_{\mathbb{R}^d} \int_{z/\max(0, y(x))}^{+\infty} \xi^{-2} d\xi \nu(dy(x)) \right]\\ &=& \exp \left[ - \int_{\mathbb{R}^d} z^{-1} \max \{0, y(x) \} \nu(dy(x)) \right] = \exp\left(-\frac{1}{z}\right) \end{eqnarray*} As with the Smith model, the process defined in equation~\eqref{eq:SchlatherChar} has a practical interpretation. Think of $\xi_i Y_i(\cdot)$ as the daily spatial rainfall events so that all these events have the same spatial dependence structure but differ only in their magnitude $\xi_i$. This model differs slightly from Smith's one as we now have no deterministic shape such as a multivariate normal density for the storms but a random shape driven by the process $Y(\cdot)$. The Schlather and Smith characterisations have strong connections. To see this, let consider the case for which $Y_i(x) = f_0(x - X_i)$ where $f_0$ is a probability density function and $\{X_i\}$ is a homogeneous Poisson process both in $\mathbb{R}^d$. With this particular setting, model~\eqref{eq:SchlatherChar} is identical to model~\eqref{eq:smithChar}. Equation~\eqref{eq:SchlatherChar} is very general and we need additional assumptions to get practical models. Schlather proposed to take $Y_i(\cdot)$ to be a stationary standard Gaussian process with correlation function $\rho(h)$, scaled so that $\mathbb{E}[\max\{0, Y_i(x)\}] = 1$. With these new assumptions, it can be shown that the bivariate CDF of process~\eqref{eq:SchlatherChar} is \begin{equation} \label{eq:schlather} \Pr[Z(x_1) \leq z_1, Z(x_2) \leq z_2] = \exp\left[-\frac{1}{2} \left(\frac{1}{z_1} + \frac{1}{z_2} \right) \left(1 + \sqrt{1 - 2 (\rho(h) + 1) \frac{z_1 z_2}{(z_1 + z_2)^2}} \right) \right] \end{equation} where $h \in \mathbb{R}^+$ is the Euclidean distance\index{distance!Euclidean} between location 1 and location 2. Usually, $\rho(h)$, is chosen from one of the valid parametric families, such as <>= par(mfrow=c(1,4), mar = c(4, 4, 0.2, 0.2)) covariance(nugget = 0, sill = 1, range = 1, smooth = 4, cov.mod = "whitmat", xlim = c(0,9), ylim = c(0, 1)) covariance(nugget = 0, sill = 1, range = 1, smooth = 2, cov.mod = "whitmat", add = TRUE, col = 2, xlim = c(0,9)) covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "whitmat", add = TRUE, col = 3, xlim = c(0,9)) covariance(nugget = 0, sill = 1, range = 1, smooth = 0.5, cov.mod = "whitmat", col = 4, add = TRUE, xlim = c(0,9)) legend("topright", c(expression(nu == 4), expression(nu == 2), expression(nu == 1), expression(nu == 0.5)), col = 1:4, lty = 1, inset = 0.05) covariance(nugget = 0, sill = 1, range = 1, smooth = 2, cov.mod = "powexp", xlim = c(0, 4), ylim = c(0, 1)) covariance(nugget = 0, sill = 1, range = 1, smooth = 1.5, cov.mod = "powexp", add = TRUE, col = 2, xlim = c(0, 4)) covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "powexp", add = TRUE, col = 3, xlim = c(0, 4)) covariance(nugget = 0, sill = 1, range = 1, smooth = 0.75, cov.mod = "powexp", add = TRUE, col = 4, xlim = c(0, 4)) legend("topright", c(expression(nu == 2), expression(nu == 1.5), expression(nu == 1), expression(nu == 0.75)), col = 1:4, lty = 1, inset = 0.05) covariance(nugget = 0, sill = 1, range = 1, smooth = 4, cov.mod = "cauchy", xlim = c(0, 5), ylim = c(0, 1)) covariance(nugget = 0, sill = 1, range = 1, smooth = 2, cov.mod = "cauchy", add = TRUE, col = 2, xlim = c(0, 5)) covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "cauchy", add = TRUE, col = 3, xlim = c(0, 5)) covariance(nugget = 0, sill = 1, range = 1, smooth = 0.75, cov.mod = "cauchy", add = TRUE, col = 4, xlim = c(0, 5)) legend("topright", c(expression(nu == 4), expression(nu == 2), expression(nu == 1), expression(nu == 0.75)), col = 1:4, lty = 1, inset = 0.05) covariance(nugget = 0, sill = 1, range = 1, smooth = 2, cov.mod = "bessel", xlim = c(0, 20), ylim = c(-0.3, 1)) covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "bessel", add = TRUE, col = 2, xlim = c(0, 20)) covariance(nugget = 0, sill = 1, range = 1, smooth = .5, cov.mod = "bessel", add = TRUE, col = 3, xlim = c(0, 20)) covariance(nugget = 0, sill = 1, range = 1, smooth = 0.25, cov.mod = "bessel", add = TRUE, col = 4, xlim = c(0, 20)) legend("topright", c(expression(nu == 2), expression(nu == 1), expression(nu == .5), expression(nu == 0.25)), col = 1:4, lty = 1, inset = 0.05) @ \begin{figure} \centering <>= <> @ \caption{Plots of the Whittle--Matérn, the powered exponential, the Cauchy and the Bessel correlation functions - from left to right. The sill and the range parameters are $c_1 = c_2 = 1$ while the smooth parameters are given in the legends.} \label{fig:covariances} \end{figure} \bigskip \begin{tabular}{lll} \textbf{Whittle--Matérn}\index{covariance function!Whittle--Mat\'ern} & $\rho(h) = \frac{2^{1-\nu}}{\Gamma(\nu)} \left(\frac{h}{c_2} \right)^{\nu} K_{\nu}\left(\frac{h}{c_2} \right)$, & $c_2>0$, \quad $\nu > 0$\\ \textbf{Cauchy}\index{covariance function!cauchy} & $\rho(h) = \left[1 + \left(\frac{h}{c_2} \right)^2 \right]^{-\nu}$, & $c_2 >0$, \quad $\nu > 0$\\ \textbf{Powered Exponential}\index{covariance function!powered exponential} & $\rho(h) = \exp\left[- \left(\frac{h}{c_2} \right)^{\nu} \right]$, & $c_2 > 0$, \quad $0 < \nu \leq 2$\\ \textbf{Bessel}\index{covariance function!bessel} & $\rho(h) = \left(\frac{2 c_2}{h} \right)^{\nu} \Gamma(\nu + 1) J_{\nu}\left(\frac{h}{c_2} \right)$, & $c_2>0$, \quad $\nu \geq \frac{d - 2}{2}$ \end{tabular} \bigskip where $c_2$ and $\nu$ are the range \index{covariance function!range} and the smooth \index{covariance function!smooth} parameters of the correlation function, $\Gamma$ is the gamma function and $J_{\nu}$ and $K_{\nu}$ are the Bessel and the modified Bessel function of the third kind with order $\nu$ and $d$ is the dimension of the random fields. Accordingly to Gaussian processes, it is possible to add a sill $c_1$ and a nugget effect $\nu$ to these correlation functions i.e.\ \begin{equation*} \rho_*(h) = \begin{cases} \nu + c_1, & h = 0\\ c_1 \rho(h), & h > 0 \end{cases} \end{equation*} where $\rho$ is one of the correlation functions introduced above. However, as \citet{Schlather2002} consider stationary standard Gaussian processes, the sill and nugget parameters satisfy $\nu = 1 - c_1$ because the correlation function has to be equal to $1$ at the origin. Figure~\ref{fig:covariances} plots the correlation functions for the parametric families introduced above. The left panel was generated with the following lines <>= covariance(nugget = 0, sill = 1, range = 1, smooth = 4, cov.mod = "whitmat", xlim = c(0,9), ylim = c(0, 1)) covariance(nugget = 0, sill = 1, range = 1, smooth = 2, cov.mod = "whitmat", add = TRUE, col = 2, xlim = c(0,9)) covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "whitmat", add = TRUE, col = 3, xlim = c(0,9)) covariance(nugget = 0, sill = 1, range = 1, smooth = 0.5, cov.mod = "whitmat", col = 4, add = TRUE, xlim = c(0,9)) legend("topright", c(expression(nu == 4), expression(nu == 2), expression(nu == 1), expression(nu == 0.5)), col = 1:4, lty = 1, inset = 0.05) @ <>= x <- y <- seq(0, 10, length = 100) set.seed(12) ms0 <- rmaxstab(1, cbind(x, y), "whitmat", nugget = 0, range = 1, smooth = 1, grid=TRUE) set.seed(12) ms1 <- rmaxstab(1, cbind(x, y), "powexp", nugget = 0, range = 1.5, smooth = 1, grid=TRUE) png("Figures/Schlather2Sim.png", width = 1400, height = 700) par(mfrow=c(1,2)) image(x, y, log(ms0), col = terrain.colors(64)) image(x, y, log(ms1), col = terrain.colors(64)) dev.off() @ \begin{figure} \centering \includegraphics[width=0.75\textwidth]{Figures/Schlather2Sim} \caption{Two simulations of the Schlather model with different correlation functions having approximately the same practical range. Left panel: Whittle--Matérn with $c_1 = c_2 = \nu = 1$. Right panel: Powered exponential with $c_1 = \nu = 1$ and $c_2 = 1.5$. The max-stable processes are transformed to unit Gumbel margins for viewing purposes.} \label{fig:Schlather2sim} \end{figure} Figure~\ref{fig:Schlather2sim} plots two realisations of the Schlather model with the powered exponential and Whittle--Matérn correlation functions. It can be seen that the powered exponential model leads to more rough random fields as, with this particular setting for the covariance parameters, the slope of the powered exponential correlation function near the origin is steeper than the Whittle--Matérn. <>= cov.fun <- covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "powexp", plot = FALSE) phi <- pi / 4 r <- sqrt(0.5) A <- matrix(c(cos(phi), r^2 * sin(phi), r^2 * sin(phi), cos(phi)), 2) cov.fun.aniso <- function(vec.pos) cov.fun(sqrt(vec.pos %*% A %*% vec.pos)) rho1 <- rho2 <- matrix(NA, 100, 100) xs <- ys <- seq(-4, 4, length = 100) for (i in 1:100){ x <- xs[i] for (j in 1:100){ rho1[i,j] <- cov.fun(sqrt(x^2+ys[j]^2)) rho2[i,j] <- cov.fun.aniso(c(x, ys[j])) } } par(mfrow=c(1,2)) contour(xs, ys, rho1, xlab = expression(paste(Delta, x)), ylab = expression(paste(Delta, y))) contour(xs, ys, rho2, xlab = expression(paste(Delta, x)), ylab = expression(paste(Delta, y))) @ \begin{figure} \centering <>= <> @ \caption{Contour plots of an isotropic (left panel) and anisotropic (right panel) correlation function. Powered exponential family with $c_1 = c_2 = \nu = 1$. The anisotropy parameters are: $\varphi = \pi / 4$, $r = 0.5$.} \label{fig:anisoCovFun} \end{figure} The correlation functions introduced above are all isotropic, but model~\eqref{eq:schlather} doesn't require this assumption. From a valid correlation function $\rho$ it is always possible to get an elliptical correlation function\index{covariance function!elliptical} $\rho_e$ by using the following transformation: \begin{equation} \label{eq:ellipticalCorrFun} \rho_e(\Delta x) = \rho\left(\sqrt{{\Delta x}^T A \Delta x} \right) \end{equation} where $\Delta x$ is the distance vector between two stations, $A$ is any positive semi-definite matrix that may involve additional parameters. For example, if the spatial domain belongs to $\mathbb{R}^2$, a convenient parametrization for $A$ is given by \begin{equation*} A = \begin{bmatrix} \phantom{r^2}\cos \varphi & r^2 \sin \varphi\\ r^2 \sin \varphi & \phantom{r^2} \cos \varphi \end{bmatrix} \end{equation*} where $\varphi \in [0, \pi)$ is the rotation angle and $0 < r < 1$ is the ratio between the minor and major principal axes of the ellipse. Figure~\ref{fig:anisoCovFun} plots the contour of an isotropic correlation function and an anistropic one derived from equation~\eqref{eq:ellipticalCorrFun}. The correlation coefficient $\rho(h)$ can take any value in $[-1,-1]$. Complete dependence is reached when $\rho(h) = 1$ while independence occurs when $\rho(h) = -1$. However, most parametric correlation families don't allow negative values so that independence is never reached. \chapter{Unconditional Simulation of Random Fields} \label{cha:uncond-simul-rand} In this chapter, we present some results useful for the simulation of stationary max-stable random fields. This chapter is organized as follows. As the max-stable process proposed by \citet{Schlather2002} involves the simulation of gaussian processes, techniques for simulating such processes are first introduced. Next algorithms for simulating max-stable processes are presented. \section{Simulation of Gaussian Random Fields} \label{sec:simul-gauss-proc} The simulation of gaussian random fields has been of great interest since the pioneer work of \citet{Matheron1973}. Various methods, more or less accurate and CPU demanding, were introduced. In this section we will focus especially on three different techniques: the direct approach, the circulant embedding method \citep{Chan1997} and the turning bands \citep{Matheron1973}. These simulation techniques are introduced in turn. \subsection{Direct Approach} \label{sec:direct-approach} The direct approach is certainly the simplest approach for simulating gaussian processes. Although this method is exact in principle and may be used in $\mathbb{R}^d$, $d\geq1$, its use for a large number of locations is prohibited as we will see later. The direct simulation relies on the following property \citep{Davis1987}. \begin{prop} If $X$ is a $n$ vector of independent standard normal random variables, then \begin{equation} \label{eq:directDecomp} Y = D^{1/2} X, \qquad D = D^{1/2} (D^{1/2})^T \end{equation} is a multivariate gaussian random vector with zero mean and covariance matrix $D$. \end{prop} \begin{proof} $Y$ has a zero mean as \begin{equation*} \mathbb{E}[Y] = \mathbb{E}[D^{1/2} X]= D^{1/2} \mathbb{E}[X] = 0. \end{equation*} The covariance of $Y$ is therefore \begin{equation*} \mathbb{E}[Y Y^T] = D^{1/2} \mathbb{E}[X X^T] (D^{1/2})^T = D^{1/2} I_n (D^{1/2})^T = D^{1/2} (D^{1/2})^T = D \end{equation*} where $I_n$ is the $n \times n$ identity matrix. \end{proof} The matrix $D^{1/2}$ can be obtained using either the Cholesky, the eigen or the singular value decompositions. These decompositions have in general a computational cost of order $O(n^3)$ while the product $D^{1/2}X$ has a cost of order $O(n^2)$ \citep{Dietrich1995}. This prevents the use of the direct simulation for large $n$, typically for $n$ larger than $1000$. To generate a gaussian random field with zero mean and covariance function $\gamma$ observed at locations $x_1$, \ldots, $x_n$, one can use the following scheme \begin{enumerate} \item Build the covariance matrix $D = [\gamma(x_i - x_j)]_{i,j}$, $1 \leq i, j \leq n$; \item Compute $D^{1/2}$ by using the Cholesky or the eigen decompositions of $D$; \item Generate a $n$ vector of independent standard normal random variables; \item And use equation~(\ref{eq:directDecomp}) to compute $Y$. \end{enumerate} \subsection{Circulant Embedding Method} \label{sec:circ-embedd-meth} \subsection{Turning Bands Method} \label{sec:turning-bands-method} The turning bands method introduced by \citet{Matheron1973} simplifies the problem of simulating a $d$-dimensional random field to the simulation of one-dimensional random fields. Consequently, this technique allows one to perform a simulation of a large dimensional random field at the cost of one-dimensional simulations. Let $X(\cdot)$ be a zero mean isotropic random process in $\mathbb{R}$ with covariance function $\gamma_X$. The idea consists in considering a random field $Y(\cdot)$ in $\mathbb{R}^d$ such that $Y$ equals $X$ on a random line in $\mathbb{R}^d$ and is constant on all hyperplanes orthogonal to this line - see Figure~\ref{fig:tbm} left panel. As the random line has to be centred at the origin, it is uniquely defined by a random vector $U$ uniformly distributed over the half unit sphere $S_d$ of $\mathbb{R}^d$. More formally, we have \begin{equation} \label{eq:turningBandsdDimTo1dim} Y(x) = X\left(\langle x, U \rangle \right), \qquad \forall x \in \mathbb{R}^d \end{equation} where $\langle , \rangle$ is the standard inner product in $\mathbb{R}^d$. The covariance of the process $Z(\cdot)$ is therefore \begin{eqnarray*} \gamma_Y(h) &=& \mathbb{E}\left[Y(x), Y(x+h) \right]\\ &=& \mathbb{E}\left\{\mathbb{E}\left[X(\langle x, U \rangle ) , X(\langle x+h, U \rangle) \right] \right\}\\ &=& \mathbb{E}\left[\gamma_X\left(\langle h, U\rangle\right) \right]\\ &=& \int_{S_d} \gamma_X\left(\langle h, u \rangle \right) \omega_d(\mbox{du}) \end{eqnarray*} where $\omega_d(\mbox{du})$ is the uniform density over the half unit sphere in $\mathbb{R}^d$ and $h$ is a vector in $\mathbb{R}^d$. \citet{Matheron1973} shows that the previous equation is a one-to-one mapping between the set of continuous isotropic covariance functions in $\mathbb{R}^d$ and that of continuous covariances in $\mathbb{R}$. Therefore, it allows one to simulate random fields having covariance $\gamma_Y$ by the simulation of one-dimensional random fields with covariance $\gamma_X$ through the turning bands operator $T(\gamma_X)$ \begin{equation} \label{eq:tbmOperator} T\left\{\gamma_X(h)\right\} = \int_{S_d} \gamma_Y\left(\langle h, u \rangle \right) \omega_d(\mbox{du}) \end{equation} The turning band operator is often complicated when $d=2$ and is defined by \begin{equation} \label{eq:tbmOperatordim2} \gamma_Y(h) = \frac{2}{\pi} \int_0^h \frac{\gamma_X(u)}{\sqrt{h^2 - u^2}} \mbox{du} \qquad \text{or equivalently} \qquad \gamma_X(h) = \frac{\mbox{d}}{\mbox{dh}} \int_0^h \frac{u \gamma_Y(u)}{\sqrt{h^2 - u^2}} \mbox{du} \end{equation} while the case $d=3$ leads to more tractable expressions \begin{equation} \label{eq:tbmOperatordim3} \gamma_Y(h) = \frac{1}{h} \int_0^h \gamma_X(u) \mbox{du} \qquad \text{or equivalently} \qquad \gamma_X(h) = \frac{\mbox{d}}{\mbox{dh}} \left\{h \gamma_Y(h) \right\} \end{equation} As stated by \citet{Chiles1999}, equation~\eqref{eq:turningBandsdDimTo1dim} is not satisfactory as the random field $Y(\cdot)$ exhibits zonal anisotropy oriented along the random direction given by $U$ so that its empirical covariance does not match the theoretical one. To bypass this hurdle, $N$ i.i.d. random fields are superimposed i.e.\ \begin{equation} \label{eq:1} Y(x) = \frac{1}{\sqrt{N}} \sum_{i=1}^N Y_i\left( \langle x, U_i \rangle \right) \end{equation} where $U_i$ are i.i.d random variable uniformly distribution over $S_d$ and $X_i(\cdot)$ are i.i.d. one-dimensional random fields having covariance $\gamma_X$. For practical simulations, the number of lines generated is of order $50$ and $500$ for the two and three dimensional cases respectively. However, it has to be noticed that the simulation of the process on the line is often faster for the three dimensional case, or the only possible one, if the turning band operator in the two dimensional case leads to a complex covariance function in $\mathbb{R}$. For such cases, it might be preferable to simulate a three dimensional random field and take its values on an arbitrary random field. Although the directions $U_i$ were supposed so far to be uniformly distributed over $S_d$, the same result holds if the random directions are generated from an equidistributed sequence of points in $S_d$. For the case $d=2$, several authors recommend the use of deterministic equal angles between the lines \citep{Chiles1997,Schlather1999}. For the case $d=3$, \citet{Freulon1991} show that using a Van der Corput sequence instead of uniform directions leads to better ergodic properties for the simulated random fields. The Van der Corput sequence computes the binary and ternary decomposition of any integer $n \in \mathbb{N}^*$ i.e.\ \begin{eqnarray} \label{eq:vdcBin} n &=& a_0 + 2 a_1 + \ldots + 2^p a_p, \qquad a_i = 0, 1\\ n &=& b_0 + 3 b_1 + \ldots + 3^p b_q, \qquad b_i = 0, 1, 2 \end{eqnarray} and leads to two numbers between $0$ and $1$ \begin{eqnarray*} u_n &=& \frac{a_0}{2} + \frac{a_1}{2^2} + \ldots + \frac{a_p}{2^{p+1}}\\ v_n &=& \frac{b_0}{3} + \frac{b_1}{3^2} + \ldots + \frac{b_q}{3^{q+1}} \end{eqnarray*} from which we get a point lying in $S_3$ \begin{equation} \label{eq:vdcSphere} \left\{\sqrt{1 - v_n^2} \cos(2 \pi u_n), \sqrt{1 - v_n^2} \sin(2 \pi u_n), v_n \right\} \end{equation} For practical purposes, this sequence is defined up to a random rotation to avoid simulations based on the same set of directions. Until now, we have not describe how to simulate a process on a line. The circulant embedding method could be used but this is not fully satisfactory. Although the locations in $\mathbb{R}^d$ defines a grid, their projections on the lines will usually not be regularly spaced. Fortunately, it is possible to perform a continuous simulation along the lines. An accurate method for this is the continuous spectral method \citep{Mantoglou1987}. If $\gamma_X$ is continuous, Bochner's theorem states that it is the Fourier transform of a positive measure $\chi$, the spectral measure, \begin{equation} \label{eq:bochner} \gamma_X(h) = \int_{\mathbb{R}^d} \exp \left(i \langle u, h \rangle \right) \mbox{d$\chi$}(u) \end{equation} Since $\gamma_X(0) = \sigma^2$, $\sigma^{-2} \chi$ is a probability distribution function. Hence, if $\Omega$ is a random vector with distribution $\sigma^{-2} \chi$ and $\Phi \sim U(0, 2 \pi)$, $\Omega$ and $\Phi$ being mutually independent, then the random function \begin{equation} \label{eq:contSpectral} X(x) = \sqrt{2\sigma^2} \cos \left(\langle \Omega, x \rangle + \Phi \right) \end{equation} has $\gamma_X$ as its covariance function. \begin{proof} \begin{eqnarray*} \mbox{Cov}\left\{X(o), X(h) \right\} &=& \mathbb{E} \left[ X(o) X(h) \right]\\ &=& 2 \sigma^2 \mathbb{E} \left\{ \mathbb{E} \left[ \cos \left(\langle \Omega, h \rangle + \Phi \right) \cos \Phi \right] \right\}\\ &=& \sigma^2 \mathbb{E} \left[ \cos \langle \Omega, h \rangle \right]\\ &=& \int_{\mathbb{R}^d} \exp\left(i \langle u, h \rangle \right) \mbox{d$\chi$}(u)\\ &=& \gamma_X(h) \end{eqnarray*} \end{proof} A comprehensive overview of methods for simulating random processes on a line from various covariance families is given by \citet{Emery2006}. <>= x <- y <- seq(0, 10, length = 100) coord <- cbind(x, y) seed <- 3 set.seed(seed) data1 <- rgp(1, coord, cov.mod = "whitmat", sill = 1, range = 1, smooth = 1, grid = TRUE, control = list(nlines = 1)) set.seed(seed) data2 <- rgp(1, coord, cov.mod = "whitmat", sill = 1, range = 1, smooth = 1, grid = TRUE, control = list(nlines = 25)) set.seed(seed) data3 <- rgp(1, coord, cov.mod = "whitmat", sill = 1, range = 1, smooth = 1, grid = TRUE, control = list(nlines = 625)) png("Figures/tbm.png", width = 1400, height = 700) par(mfrow=c(1,3)) image(x, y, data1, col = terrain.colors(64)) image(x, y, data2, col = terrain.colors(64)) image(x, y, data3, col = terrain.colors(64)) dev.off() @ \begin{figure} \centering \includegraphics[width = \textwidth]{Figures/tbm} \caption{One realisation of a gaussian random field having a Whittle-Matern covariance function with sill, range and smooth parameters all equal to one using the turning bands methods. The continuous spectral method was used for simulating processes on the lines. Left panel: $1$ line, middle panel: $25$ lines, right panel: $625$ lines.} \label{fig:tbm} \end{figure} The function \verb|rgp| allows simulations of stationary isotropic gaussian fields. Figure~\ref{fig:tbm} plots three realisations of a gaussian random field having a Whittle-Matern covariance function by using the turning bands method. The number of lines used for these simulations, from left to right, is $1$, $25$ and $625$ respectively. The code used for generating this figure is <>= <> @ \section{Simulation of Max-Stable Random Fields} \label{sec:simul-max-stable} The definition of a max-stable process involves the maximum over an infinite number of replication of a given random process. For simulation purposes, the number of replications is necessarily finite. However, \citet{Schlather2002} shows that it is possible to get exact simulations on a finite sampling region. \begin{thm} \label{thm:simMaxStab} Let $Y$ be a measurable random function such that $\mathbb{E}[\int_{\mathbb{R}^d} \max\{0, Y(x)\} \mbox{dx}] = 1$, $\Pi$ be a Poisson process on $\mathbb{R}^d \times (0, +\infty)$ with intensity measure $d\Lambda(y,\xi) = \xi^{-2} \mbox{dy} \mbox{d$\xi$}$ and $Z(\cdot)$ as defined by equation~(\ref{eq:SchlatherChar}). Assume that $Y$ is uniformly bounded by $C \in (0, +\infty)$ and has support in the ball $b(o, r)$ for some $r \in (0, +\infty)$. Let $B$ be a compact set, $Y_i$ be i.i.d. replications of $Y$, $U_i$ be i.i.d. uniformly distributed on $B_r = \cup_{x \in B} b(x, r)$, $\xi_i$ be i.i.d. standard exponential r.v. and $\Pi$, $Y_i$, $\xi_i$, $U_i$ be mutually independent. Then, on $B$, \begin{equation} \label{eq:simMaxStab} Z_*(x) = |B_r| \sup_i \frac{Y_i(x - U_i)}{\sum_{k=1}^i \xi_k}, \qquad x \in B, \qquad i = 1, 2, \ldots \end{equation} equals the max-stable random fields $Z$ in distribution, and \begin{equation} \label{eq:simMaxStab2} Z_*(x) = |B_r| \sup \left\{\frac{Y_i(x - U_i)}{\sum_{k=1}^i \xi_k}: i = 1, \ldots, m, m \text{ is such that } \frac{C}{\sum_{k=1}^m \xi_k} \leq \max_{1 \leq i \leq m} \frac{Y_i(x - U_i)}{\sum_{k=1}^i \xi_k} \right\} \end{equation} almost surely. \end{thm} \begin{proof} It is well known that a Poisson process on $\mathbb{R}^+$ with intensity $1$ can be thought as a sum of i.i.d standard exponential random variables i.e.\ $\Pi = \{\sum_{i=1}^n \xi_i : n = 1, 2, \ldots \}$. The application of the mapping $\xi \mapsto |B_r| \xi^{-1}$ to the points of $\Pi$ yields to a new Poisson process on $\mathbb{R}^+$ with intensity measure $|B_r| \xi^{-2} \mbox{d$\xi$}$. As the $U_i$ are i.i.d. uniformly distributed on $B_r$, we have that the random set \begin{equation} \label{eq:simPoissProc} \left\{ \left(U_n, \frac{|B_r|}{\sum_{i=1}^n \xi_i} \right) : n = 1, 2, \ldots \right\} \end{equation} is a simple Poisson process on $B_r \times (0, +\infty)$ with intensity measure $d\Lambda(y,\xi) = \xi^{-2} \mbox{dy}\mbox{d$\xi$}$ and the process $Z_*$ equals $Z$ in distribution. To show the second assertion, we first note that $m$ is necessarily finite as the sequence $\sum_{k=1}^m \xi_k$ is non decreasing and $Y$ is uniformly bounded by $C$ so that \begin{equation*} \lim_{m \to +\infty} \frac{C}{\sum_{k=1}^m \xi_k} = 0 \end{equation*} Hence, there exists $m$ finite such that \begin{equation*} \Pr \left[\max_{1 \leq i \leq m} \frac{Y_i(x - U_i)}{\sum_{k=1}^i \xi_k} \geq \frac{C}{\sum_{k=1}^m \xi_k} \right] = 1 \end{equation*} and \begin{equation*} \Pr \left[ \left| \max_{1 \leq i \leq m} \frac{Y_i(x - U_i)}{\sum_{k=1}^i \xi_k} - \sup_{i \geq 1} \frac{Y_i(x - U_i)}{\sum_{k=1}^i \xi_k} \right| \neq 0 \right] = 0 \end{equation*} \end{proof} If the conditions of Theorem~\ref{thm:simMaxStab} are not met i.e.\ for random functions $Y$ whose support is not included in a ball $b(o,r)$ or which are not uniformly bounded by a constant $C$, approximations for $r$ and $C$ should be used. Although the simulations will not be exact, it seems that approximations for $r$ and $C$ lead to accurate simulations. \subsection{The Smith Model} \label{subsec:sim-smith-model} Recall that the Smith model is defined by \begin{equation} \label{eq:SmithChar2} Z(x) = \max_i \xi_i Y_i(x), \qquad Y_i(x) = f(x - U_i) \end{equation} where $f$ is the zero mean multivariate normal density with covariance matrix $\Sigma$ and $\{(U_i, \xi_i)\}_{i \geq 1}$ are the points of a Poisson process with intensity measure $d\Lambda(y,\xi) = \xi^{-2} \mbox{dy} \mbox{d$\xi$}$. As we just mentionned, the use of Theorem~\ref{thm:simMaxStab} to generate realisations from the Smith model is not theoretically justified as the support of the multivariate normal distribution is not included in a finite ball $b(o,r)$, $r < +\infty$. However, if $r$ is large enough, the multivariate normal density should be close to 0 and the points in $x \in \mathbb{R}^d \setminus b(o,r)$ are unlikely to contribute to $Z(x)$. \citet{Schlather2002} in his seminal paper suggests to use $r$ such that $\varphi(r) = 0.001$ i.e. $r \approx 3.46$, where $\varphi$ denotes the standard normal density. By taking into account that the variance in the covariance matrix $\Sigma$ might be large, a reasonable choice is to let \begin{equation} \label{eq:defFiniteBallSmith} r = 3.46 \sqrt{\max \left\{\sigma_{ii}: i = 1, \ldots, d\right\}} \end{equation} where $\sigma_{ii}$ are the diagonal elements of the covariance matrix $\Sigma$. <>= x <- y <- seq(0, 10, length = 100) coord <- cbind(x, y) set.seed(8) M0 <- rmaxstab(1, coord, "gauss", cov11 = 9/8, cov12 = 0, cov22 = 9/8, grid = TRUE) set.seed(8) M1 <- rmaxstab(1, coord, "gauss", cov11 = 9/8, cov12 = 3/4, cov22 = 9/8, grid = TRUE) png("Figures/rmaxstabSmith.png", width = 1400, height = 700) par(mfrow = c(1,2)) image(x, y, log(M0), col = terrain.colors(64)) image(x, y, log(M1), col = terrain.colors(64)) dev.off() @ \begin{figure} \centering \includegraphics[width=0.75\textwidth]{Figures/rmaxstabSmith} \caption{Two simulations of the Smith model with different covariance matrices. Left panel: $\sigma_{11} = \sigma_{22} = 9/8$, $\sigma_{12} = 0$. Right panel: $\sigma_{11} = \sigma_{22} = 9/8$, $\sigma_{12} = 3/4$. The observations are transformed to unit Gumbel margins for viewing purposes.} \label{fig:rmaxstabSmith} \end{figure} The function \verb|rmaxstab| allows the simulation from the Smith model. Figure~\ref{fig:rmaxstabSmith} plots two realisations on a $512 \times 512$ grid with two different covariance matrices $\Sigma$. The generation of these two random fields takes around $2.5$ seconds each. The figures was generated by invoking the following lines <>= <> @ \subsection{The Schlather Model} \label{subsec:sim-schlather-model} Recall that the Schlather model is defined by \begin{equation} \label{eq:SmithChar2} Z(x) = \max_i \xi_i \max \left\{0, Y_i(x)\right\}, \end{equation} where $Y_i(\cdot)$ are i.i.d. stationary standard gaussian processes with correlation function $\rho(h)$, scaled so that $\mathbb{E}[\max \{0, Y_i(x) \}] = 1$ and $\{\xi_i \}_{i\geq 1}$ are the points of a Poisson process on $\mathbb{R}_*^+$ with intensity measure $\mbox{d$\Lambda$}(\xi) = \xi^{-2} \mbox{d$\xi$}$. Again, the conditions required by theorem~\ref{thm:simMaxStab} are not satisfied as the gaussian processes are not uniformly bounded. However, \citet{Schlather2002} claims that by choosing a constant $C$ such that $\Pr[ \max \{0, Y_i(x) \} > C]$ is small, the simulation procedure is still accurate and recommends the use of $C=3$. The package allows for simulation of max-stable processes by using the \verb|rmaxstab| function. A typical use for scattered location is <>= coord <- matrix(runif(100, 0, 10), ncol = 2) data1 <- rmaxstab(100, coord, "whitmat", nugget = 0, range = 1, smooth = 1) @ while for locations located on a grid, users should invoke <>= x <- seq(0, 10, length = 100) coord <- cbind(x, x) data2 <- rmaxstab(1, coord, "powexp", nugget = 0, range = 1, smooth = 2, grid = TRUE) @ \chapter{Spatial Dependence of Max-Stable Random Fields} \label{cha:spatial-depend-max} Sooner or later, statistical modellers will be interested in knowing how evolves dependence in space. When dealing with non-extremal data, a common tools is the (semi-)variogram\index{variogram} $\gamma$ \citep{Cressie1993}. Let $Y(\cdot)$ be a stationary gaussian process with correlation function $\rho$ and variance $\sigma^2$. It is well known that $Y(\cdot)$ is fully characterized by its mean and its covariance. Consequently, the variogram defined as \begin{equation} \label{eq:variogram} \gamma(x_1 - x_2) = \frac{1}{2} \mbox{Var}\left[Y(x_1) - Y(x_2) \right] = \sigma^2 \left\{1 - \rho(x_1 - x_2) \right\} \end{equation} determines the degree of spatial dependence of $Y(\cdot)$. When extreme values are of interest, the variogram is no longer a useful tool, as it may not even exist. Therefore, there is a need to develop more suitable tools for analyzing the spatial dependence of max-stable fields. In this chapter, we will present the extremal coefficient as a measure of the degree of dependence for extreme values, and variogram-based approaches that are especially well adapted to extremes. \section{The Extremal Coefficient} \label{sec:extremal-coefficient} Let $Z(\cdot)$ be a stationary max-stable random field with unit Fréchet margins. The extremal dependence among $N$ fixed locations in $\mathbb{R}^d$ can be summarised by the extremal coefficient\index{extremal coefficient}, which is defined as: \begin{equation} \label{eq:extcoeff} \Pr\left[Z(x_1) \leq z, \ldots, Z(x_N) \leq z\right] = \exp\left(- \frac{\theta_N}{z} \right) \end{equation} where $1 \leq \theta_N \leq N$ with the lower and upper bounds corresponding to complete dependence and independence and thus provides a measure of the degree of spatial dependence between stations. Following this idea, $\theta_N$ can be thought as the effective number of independent stations. Given the properties of the max-stable process with unit Fréchet marings, the finite-dimensional CDF belongs to the class of multivariate extreme value distributions \begin{equation} \label{eq:MEVD} \Pr\left[Z(x_1) \leq z_1, \ldots, Z(x_N) \leq z_N\right] = \exp\left\{- V(z_1, \ldots, z_N) \right\} \end{equation} where $V$ is a homogeneous function of order $-1$ called the exponent measure \citep{Pickands1981,Coles2001}. As a consequence, the homogeneity property of $V$ implies a strong relationship between the exponent measure and the extremal coefficient \begin{equation} \label{eq:extCoeffVfun} \theta_N = V(1, \ldots, 1) \end{equation} An important special case of equation~\eqref{eq:extcoeff} is to consider pairwise extremal coefficients, that is \begin{equation} \label{eq:extcoeffPairwise} \Pr\left[Z(x_1) \leq z, Z(x_2) \leq z \right] = \exp\left\{-\frac{\theta(x_1 - x_2)}{z} \right\} \end{equation} Following \citet{Schlather2003}, $\theta(\cdot)$ is called the extremal coefficient function\index{extremal coefficient!function}; it provides sufficient information about extremal dependence for many problems although it does not characterise the full distribution. The extremal coefficient functions for max-stable models presented in Chapter~\ref{cha:an-introduction-max} can be derived directly from their bivariate distribution by letting $z_1=z_2=z$. More precisely, we have: \bigskip \begin{tabular}{ll} \textbf{Smith} & $\theta(x_1 - x_2) = 2 \Phi\left(\frac{\sqrt{(x_1 - x_2)^T \Sigma^{-1} (x_1 - x_2)}}{2} \right)$\\ \textbf{Schlather} & $\theta(x_1 - x_2) = 1 + \sqrt{\frac{1 - \rho(x_1 - x_2)}{2}}$\\ \end{tabular} \bigskip <>= smith <- function(h) 2 * pnorm(h/2) cov.fun <- covariance(nugget = 0, sill = 1, range = 1, smooth = 1, plot = FALSE) schlather <- function(h) 1 + sqrt((1-cov.fun(h))/2) alpha <- 0.1 plot(smith, from = 0, to = 5, xlab = "h", ylab = expression(theta(h))) plot(schlather, add = TRUE, col = 2, from = 0, to = 5) legend("bottomright", c("Smith", "Schlather"), col = 1:2, lty = 1, inset = 0.05) @ \begin{figure} \centering <>= <> @ \caption{Extremal coefficient functions for different max-stable models. $\Sigma$ is the $2\times 2$ identity matrix. Correlation function: Whittle--Matérn with $c_1 = c_2 = \nu = 1$.} \label{fig:extCoeffModels} \end{figure} Figure~\ref{fig:extCoeffModels} plots the extremal coefficient function for the different max-stable models introduced. The Smith model covers the whole range of dependence while Schlather's model has an upper bound of $1 + \sqrt{1/2}$ if the covariance function only takes positive values. More generally, the properties of isotropic positive definite functions \citep{Matern1986} imply that $\theta(\cdot) \leq 1.838$ in $\mathbb{R}^2$ and $\theta(\cdot) \leq 1.781$ in $\mathbb{R}^3$ for the Schlather model. \citet{Schlather2003} prove several interesting properties for $\theta(\cdot)$: \begin{enumerate} \item $2 - \theta(\cdot)$ is a semi-definite positive function; \item $\theta(\cdot)$ isn't differentiable at $0$ unless $\theta \equiv 1$; \item if $d \geq 2$ and $Z(\cdot)$ is isotropic, $\theta(\cdot)$ has at most one jump at $0$ and is continuous elsewhere. \end{enumerate} These properties have strong consequences. The first indicates that the spatial dependence structure of a stationary max-stable process can be characterised by a correlation function. The second states that a valid correlation functions does not always lead to a valid extremal coefficient function. For instance, the Gaussian correlation model, $\rho(h) = \exp(-h^2)$, $h\geq 0$, is not allowed since it is differentiable at $h = 0$. Equation~\eqref{eq:extcoeff} forms the basis for a simple maximum likelihood estimator. Suppose we have $Z_1(\cdot)$, \ldots, $Z_n(\cdot)$, independent replications of $Z(\cdot)$ observed at a set $A = \{x_1, \ldots, x_{|A|} \}$ of locations. The log-likelihood based on equation~\eqref{eq:extcoeff} is given by: % \begin{equation} % \label{eq:llikSTNaive} % \ell_A(\theta_A) = \#\left\{i: \max_{j \in A} \left(Z_i(x_j) % \overline{Z(x_j)} \right) > 0 \right\} \log \theta_A - \theta_A % \sum_{i=1}^{n} \frac{1}{\max_{j \in A} \left( Z_i(x_j) % \overline{Z(x_j)} \right)} % \end{equation} \begin{equation} \label{eq:llikSTNaive} \ell_A(\theta_A) = n \log \theta_A - \theta_A \sum_{i=1}^{n} \frac{1}{\max_{j \in A} \left\{ Z_i(x_j) \overline{Z(x_j)} \right\}} \end{equation} where the terms of the form $\log \max_{j\in A}\{Z_i(x_j)\}$ were omitted and $\overline{Z(x_j)} = n^{-1} \sum_{i=1}^n 1 / Z_i(x_j)$. The scalings by $\overline{Z(x_j)}$ are here to ensure that $\hat{\theta}_A = 1$ when $|A| = 1$. The problem with the MLE based on equation~\eqref{eq:llikSTNaive} is that the extremal coefficient estimates may not have the properties on the extremal coefficient function stated above. To solve this, \citet{Schlather2003} propose self consistent estimators for $\theta_A$ by either sequentially correcting the estimates obtained by equation~\eqref{eq:llikSTNaive} or by modifying the log-likelihood to ensure these properties.% However, these adjustements seem of poor % value toward their numerical complexity. \citet{Smith1991} proposed another estimator for the pairwise extremal coefficients. As $Z(x)$ is unit Fréchet for all $x \in \mathbb{R}^d$, $1/Z(x)$ has a unit exponential distribution and, according to equation~\eqref{eq:extcoeffPairwise}, $1 / \max\{Z(x_1), Z(x_2)\}$ has an exponential distribution with rate $\theta(x_1 - x_2)$. By the law of large numbers $\sum_{i=1}^n 1/Z_i(x_1) = \sum_{i=1}^n 1/Z_i(x_2) \approx n$\footnote{In fact, these relations are exact if the margins were transformed to unit Fréchet by using the maximum likelihood estimates.}, this suggests a simple estimator for the extremal coefficient between locations $x_1$ and $x_2$: \begin{equation} \label{eq:extCoeffSmith} \hat{\theta}(x_1 - x_2) = \frac{n}{\sum_{i=1}^n \min \{Z_i(x_1)^{-1}, Z_i(x_2)^{-1} \}} \end{equation} <>= n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "whitmat", nugget = 0, range = 1, smooth = 1) cov.fun <- covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "whitmat", plot = FALSE) ext.coeff <- function(h) 1 + sqrt((1 - cov.fun(h))/2) par(mfrow=c(1,2), ps=14) ST <- fitextcoeff(data, coord, loess = FALSE, ylim = c(1, 2)) plot(ext.coeff, from = 0, to = 12, col = 2, add = TRUE, lwd = 1.5) Smith <- fitextcoeff(data, coord, estim = "Smith", loess = FALSE, ylim = c(1, 2)) plot(ext.coeff, from = 0, to = 12, col = 2, add = TRUE, lwd = 1.5) @ \begin{figure} \centering <>= <> @ \caption{Pairwise extremal coefficient estimates from the Schlather and Tawn (left panel) and Smith (right panel) estimators from a simulated max-stable random field having a Whittle--Matérn correlation function - $c_1 = c_2 = \nu = 1$. The red lines are the theoretical extremal coefficient function.} \label{fig:extCoeffST-Smith} \end{figure} Figure~\ref{fig:extCoeffST-Smith} plots the pairwise extremal coefficient estimates from a simulated Schlather model having a Whittle--Matérn correlation function using equations~\eqref{eq:llikSTNaive} and \eqref{eq:extCoeffSmith}. This figure was generated using the following code: <>= n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "whitmat", nugget = 0, range = 1, smooth = 1) par(mfrow=c(1,2)) fitextcoeff(data, coord, loess = FALSE) fitextcoeff(data, coord, estim = "Smith", loess = FALSE) @ \section{Madogram-based approaches} \label{sec:madogram-based} As we already stated, variograms are useful quantities to assess the degree of spatial dependence for Gaussian processes. However their use for extreme observations is limited as variograms may not exist. To see this, consider a stationary max-stable process with unit Fréchet margins. For such random processes, both mean and variance are not finite and the variogram does not exist theoretically, so we need extra work to get reliable variogram-based tools. \subsection{Madogram} \label{sec:madogram} A standard tool in geostatistics, similar to the variogram, is the madogram\index{madogram} \citep{Matheron1987}. The madogram is \begin{equation} \label{eq:madogram} \nu(x_1 - x_2) = \frac{1}{2} \mathbb{E}\left[ |Z(x_1) - Z(x_2)| \right], \end{equation} where $Z(\cdot)$ is a stationary random field with mean assumed finite. The problem with the madogram is almost identical to the one we emphasized with the variogram as unit Fréchet random variables have no finite mean. This led \citet{Cooley2006} to consider identical GEV margins with shape parameter $\xi < 1$ to ensure that the mean, and even the variance, are finite. It is possible to use the same strategy to ensure that the variogram exists theoretically but, as we will show later, we will see that working with the madogram is particularly suited for extreme values and has strong links with the extremal coefficient. By using simple arguments and some results obtained by \citet{Hosking1985} on probability weighted moments, \citet{Cooley2006} show that \begin{equation} \label{eq:mado2ExtCoeff} \theta(x_1 - x_2) = \begin{cases} u_\beta \left(\mu + \frac{\nu(x_1 - x_2)}{\Gamma(1 - \xi)}\right), & \xi < 1, \xi \neq 0,\\ \exp\left(\frac{\nu(x_1 - x_2)}{\sigma}\right), & \xi = 0, \end{cases} \end{equation} where $\mu$, $\sigma$ and $\xi$ are the location, scale and shape parameters for the GEV distribution, $\Gamma(\cdot)$ is the Gamma function and \begin{equation*} u_\beta(x) = \left(1 + \xi \frac{x - \mu}{\sigma} \right)_+^{1/\xi} \end{equation*} <>= cov.fun1 <- covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "whitmat", plot = FALSE) ext.coeff1 <- function(h) 1 + sqrt((1 - cov.fun1(h))/2) mado1 <- function(h) log(ext.coeff1(h)) cov.fun2 <- covariance(nugget = 0, sill = 1, range = 1.5, smooth = 1, cov.mod = "powexp", plot = FALSE) ext.coeff2 <- function(h) 1 + sqrt((1 - cov.fun2(h))/2) mado2 <- function(h) log(ext.coeff2(h)) n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "whitmat", nugget = 0, range = 1, smooth = 1) par(mfrow=c(1,2)) madogram(data, coord, which = "mado", ylim = c(0, log(2))) plot(mado1, from = 0, to = 12, add = TRUE, col = 2, lwd = 1.5) madogram(data, coord, which = "mado", ylim = c(0, log(2)), n.bins = 100) plot(mado1, from = 0, to = 12, add = TRUE, col = 2, lwd = 1.5) @ \begin{figure} \centering <>= <> @ \caption{Madogram (left panel) and binned madogram (right panel) with unit Gumbel margins for the Schlather model with the Whittle--Matérn correlation functions. The red lines are the theoretical madograms. Points are pairwise estimates.} \label{fig:madogram} \end{figure} Equation~\eqref{eq:madogram} suggests a simple estimator \begin{equation} \label{eq:madogramEstim} \hat{\nu}(x_1-x_2) = \frac{1}{2n} \sum_{i=1}^n |z_i(x_1) - z_i(x_2)| \end{equation} where $z_i(x_1)$ and $z_i(x_2)$ are the $i$-th observations of the random field at location $x_1$ and $x_2$ and $n$ is the total number of observations. If isotropy is assumed, then it might be preferable to use a ``binned'' version of estimator~\eqref{eq:madogramEstim} \begin{equation} \label{eq:madogramEstimBinned} \hat{\nu}(h) = \frac{1}{2n |B_h|} \sum_{(x_1, x_2) \in B_h}\sum_{i=1}^n |z_i(x_1) - z_i(x_2)| \end{equation} where $B_h$ is the set of pair of locations whose pairwise distances belong to $[h -\epsilon, h + \epsilon[$, for $\epsilon > 0$. Figure~\ref{fig:madogram} plots the theoretical madograms for the Schlather's model having a Whittle--Matérn correlation function. Pairwise and binned pairwise estimates as defined by equations~\eqref{eq:madogramEstim} ans~\eqref{eq:madogramEstimBinned} are also reported. The code used to generate these madogram estimates was <>= n.site <- 50 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "whitmat", nugget = 0, range = 1, smooth = 1) par(mfrow=c(1,2)) madogram(data, coord, which = "mado") madogram(data, coord, which = "mado", n.bins = 100) @ Using a plugin estimate in equation~\eqref{eq:mado2ExtCoeff} leads to a simple estimator for $\theta(\cdot)$: \begin{equation} \label{eq:extCoeffEstMado} \hat{\theta}(x_1 - x_2) = \begin{cases} u_\beta \left(\mu + \frac{\hat{\nu}(x_1 - x_2)}{\Gamma(1 - \xi)}\right), & \xi < 1, \xi \neq 0,\\ \exp\left(\frac{\hat{\nu}(x_1 - x_2)}{\sigma}\right), & \xi = 0, \end{cases} \end{equation} <>= n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "whitmat", nugget = 0, range = 1, smooth = 1) cov.fun <- covariance(nugget = 0, sill = 1, range = 1, smooth = 1, cov.mod = "whitmat", plot = FALSE) ext.coeff <- function(h) 1 + sqrt((1 - cov.fun(h))/2) mado <- function(h) log(ext.coeff(h)) par(mfrow=c(1,2)) madogram(data, coord, which = "mado", ylim = c(0, log(2))) plot(mado, from = 0, to = 12, col = 2, lwd = 1.5, add = TRUE) madogram(data, coord, which = "ext") plot(ext.coeff, from = 0, to = 12, col = 2, lwd = 1.5, add = TRUE) @ \begin{figure} \centering <>= <> @ \caption{Pairwise madograms (left panel) and extremal coefficients (right panel) estimates from a simulated max-stable random field having a Whittle--Matérn correlation function - $c_1 = c_2 = \nu = 1$. The red lines are the theoretical madogram and extremal coefficient function.} \label{fig:madoExtCoeff} \end{figure} Figure~\ref{fig:madoExtCoeff} plots the madogram and extremal coefficient functions estimated from a simulated max-stable process with unit Fréchet margins and having a Whittle--Matérn correlation function. These estimates were obtained by using equations~\eqref{eq:madogramEstim} and \eqref{eq:extCoeffEstMado} respectively. The figure was generated using the code below. <>= n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "whitmat", nugget = 0, range = 1, smooth = 1) madogram(data, coord) @ \subsection{$F$-Madogram} \label{sec:f-madogram} In the previous subsection, we introduced the madogram as a summary statistic for the spatial dependence structure. However, the choice of the GEV parameters to compute this madogram is somewhat arbritrary. Instead, \citet{Cooley2006} propose a modified madogram called the $F$-madogram\index{madogram!$F$-madogram} \begin{equation} \label{eq:F-madogram} \nu_F(x_1 - x_2) = \frac{1}{2} \mathbb{E} \left[|F\left(Z(x_1)\right) - F\left(Z(x_2)\right)| \right] \end{equation} where $Z(\cdot)$ is a stationary max-stable random field with unit Fréchet margins and $F(z) = \exp(-1/z)$. <>= n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "gauss", cov11 = 1, cov12 = 0, cov22 = 1) ext.coeff <- function(h) 2 * pnorm(h/2) Fmado <- function(h) 0.5 * (ext.coeff(h) - 1) / (ext.coeff(h) + 1) par(mfrow=c(1,2)) fmadogram(data, coord, which = "mado", ylim = c(0, 1/5)) plot(Fmado, from = 0, to = 13, add = TRUE, col = 2, lwd = 1.5) fmadogram(data, coord, which = "ext") plot(ext.coeff, from = 0, to = 13, add = TRUE, col = 2, lwd = 1.5) @ \begin{figure} \centering <>= <> @ \caption{Pairwise $F$-madogram (left panel) and extremal coefficient (right panel) estimates for the Smith model with $\Sigma$ equals to the identity matrix. The red lines are the theoretical $F$-madogram and extremal coefficient function.} \label{fig:F-madogram} \end{figure} The $F$-madogram is well defined even in the presence of unit Fréchet margins as we work with $F(Z(x_1))$ instead of $Z(x_1)$. Obviously, equation~\eqref{eq:F-madogram} suggests a simple estimator: \begin{equation} \label{eq:F-madogramEstim} \hat{\nu}_F(x_1 - x_2) = \frac{1}{2 n} \sum_{i=1}^{n} |\hat{F}(z_i(x_1)) - \hat{F}(z_i(x_2))| \end{equation} where $z_i(x_1)$ and $z_i(x_2)$ are the $i$-th observations of the random field at location $x_1$ and $x_2$ and $n$ is the total number of observations. The $F$-madogram has strong connections with the extremal coefficient introduced in Section~\ref{sec:extremal-coefficient}. Indeed, we have \begin{equation} \label{eq:F-mado2ExtCoeff} 2 \nu_F(x_1 - x_2) = \frac{\theta(x_1 - x_2) - 1}{\theta(x_1 - x_2) + 1} \end{equation} or conversely \begin{equation} \label{eq:extCoeff2F-mado} \theta(x_1 - x_2) = \frac{1 + 2 \nu_F(x_1 - x_2)}{1 - 2 \nu_F(x_1 - x_2)} \end{equation} \begin{proof} Let first note that \begin{equation} \label{eq:abs2max} |x - y| = 2 \max\left\{x, y \right\} - (x + y) \end{equation} Using equation~\eqref{eq:abs2max} in equation~\eqref{eq:F-madogram}, we have: \begin{eqnarray*} \nu_F(x_1 - x_2) &=& \frac{1}{2} \mathbb{E} \left[|F\left(Z(x_1)\right) - F\left(Z(x_2)\right)| \right]\\ &=& \mathbb{E}\left[\max\left\{F\left(Z(x_1)\right), F\left(Z(x_2)\right) \right\} \right] - \mathbb{E}\left[ F\left(Z(x_1) \right) \right]\\ &=& \mathbb{E}\left[F\left(\max\{Z(x_1), Z(x_2)\} \right)\right] - \frac{1}{2} \end{eqnarray*} where we used the fact that $F(Z(x_1))$ is uniformly distributed on $[0, 1]$ and $F$ is monotone increasing. Now, from Section~\ref{sec:extremal-coefficient}, we know that \begin{equation*} \Pr \left[ \max\left\{ Z(x_1), Z(x_2) \right\}\leq z \right] = \exp \left(- \frac{\theta(x_1 - x_2)}{z} \right) \end{equation*} so that \begin{eqnarray*} \mathbb{E}\left[F\left(\max\{Z(x_1), Z(x_2)\} \right) \right] &=& \theta(x_1 - x_2) \int_{0}^{+\infty} \exp\left(-\frac{1}{z} \right) \exp\left(-\frac{\theta(x_1 - x_2)}{z} \right) z^{-2}\mbox{dz}\\ &=& \frac{\theta(x_1 - x_2)}{\theta(x_1 - x_2) + 1} \end{eqnarray*} This proves equations~\eqref{eq:F-mado2ExtCoeff} and~\eqref{eq:extCoeff2F-mado}. \end{proof} As we can see from equation~\eqref{eq:extCoeff2F-mado}, there is a one-to-one relationship between the extremal coefficient and the $F$-madogram. This suggests a simple estimator for $\theta(x_1 - x_2)$ \begin{equation} \label{eq:extCoeffEstF-mado} \hat{\theta}(x_1 - x_2) = \frac{1 + 2 \hat{\nu}_F(x_1 - x_2)}{1 - 2 \hat{\nu}_F(x_1 - x_2)} \end{equation} Figure~\ref{fig:F-madogram} plots the pairwise $F$-madogram and extremal coefficient estimates from $100$ replications of the isotropic Smith model. The code used to produce this figure was: <>= n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "gauss", cov11 = 1, cov12 = 0, cov22 = 1) par(mfrow=c(1,2)) fmadogram(data, coord) @ As with the madogram presented in the previous Section, it is also possible to have binned estimates of the $F$-madogram by passing the argument \verb|n.bins| into the \verb|fmadogram| function. \subsection{$\lambda$-Madogram} \label{sec:lambda-madogram} The extremal coefficient, and thus the ($F$-)madogram, does not fully characterize the spatial dependence of a random field. Indeed, from equation~\eqref{eq:extcoeff}, it only considers $\Pr[Z(x_1) \leq z_1, Z(x_2) \leq z_2]$ where $z_1 = z_2 = z$. To solve this issue, \citet{Naveau2009} introduce the $\lambda$-madogram\index{madogram!$\lambda$-madogram} defined as \begin{equation} \label{eq:lambda-madogram} \nu_{\lambda}(x_1 - x_2) = \frac{1}{2} \mathbb{E} \left[ |F^{\lambda}\{Z(x_1)\} - F^{1-\lambda}\{Z(x_2)\}|\right] \end{equation} for any $\lambda \in [0, 1]$. The idea beyond this is that by varying $\lambda$, we will focus on $\Pr[Z(x_1) \leq z_1, Z(x_2) \leq z_2]$ where $z_1 = \lambda z$ and $z_2 = (1 - \lambda) z$ and thus explore the whole space. The $\lambda$-madogram is related to the exponent measure $V$, namely \begin{equation} \label{eq:lambda-madogramVfun} \nu_{\lambda}(x_1 - x_2) = \frac{V_{x_1,x_2}(\lambda, 1-\lambda)}{1+V_{x_1,x_2}(\lambda, 1-\lambda)} - c(\lambda) \end{equation} where $c(\lambda) = 3/\{2(1+\lambda)(2-\lambda)\}$. \begin{proof} Applying equation~\eqref{eq:abs2max} with $x = F^{\lambda}\{Z(x_1)\}$ and $y = F^{1-\lambda}\{Z(x_2)\}$, we have \begin{eqnarray*} \nu_{\lambda}(x_1 - x_2) &=& \mathbb{E}\left[\max\{F^{\lambda} \{Z(x_1)\}, F^{1-\lambda}\{Z(x_2)\} \} \right] - \frac{1}{2}\mathbb{E}\left[ F^{\lambda} \{Z(x_1)\} \right] - \frac{1}{2}\mathbb{E}\left[F^{1-\lambda} \{Z(x_2)\} \right]\\ &=& \mathbb{E}\left[\max\{F^{\lambda} \{Z(x_1)\}, F^{1-\lambda}\{Z(x_2)\} \} \right] - \frac{1}{2(1+\lambda)} - \frac{1}{2(2 - \lambda)} \end{eqnarray*} where we used the fact that $\mathbb{E}[X^k] = 1/(1+k)$, $X \sim U(0,1)$, $k>0$. From Section~\ref{sec:extremal-coefficient} we know that \begin{eqnarray*} \Pr\left[\max\{F^{\lambda} \{Z(x_1)\}, F^{1-\lambda}\{Z(x_2)\} \} \leq z \right] &=& \Pr \left[Z(x_1) \leq -\frac{\lambda}{\log z}, Z(x_2) \leq -\frac{1 - \lambda}{\log z} \right]\\ &=& \exp\left\{-\log (z) V_{x_1,x_2}\left(\lambda, 1 - \lambda \right) \right\} \end{eqnarray*} where $V_{x_1,x_2}$ is the homogeneous function of order $-1$ introduced in equation~\eqref{eq:MEVD}. Differentiating this distribution with respect to $z$ gives the probability density function of the random variable $\max\{F^{\lambda} \{Z(x_1)\}, F^{1-\lambda}\{Z(x_2)\} \}$, so that we have \begin{eqnarray*} \mathbb{E}\left[\max\{F^{\lambda} \{Z(x_1)\}, F^{1-\lambda}\{Z(x_2)\} \} \right] &=& \int_0^1 -V_{x_1,x_2}(\lambda, 1-\lambda) \exp\left\{-\log (z) V_{x_1,x_2}\left(\lambda, 1 - \lambda \right) \right\} \mbox{dz}\\ &=& \frac{V_{x_1,x_2}(\lambda, 1-\lambda)}{1+V_{x_1,x_2}(\lambda, 1-\lambda)} \end{eqnarray*} and finally \begin{equation*} \nu_{\lambda}(x_1 - x_2) = \frac{V_{x_1,x_2}(\lambda, 1-\lambda)}{1+V_{x_1,x_2}(\lambda, 1-\lambda)} - c(\lambda) \end{equation*} where $c(\lambda) = 3/\{2(1+\lambda)(2-\lambda)\}$. \end{proof} Again there is a one-to-one relationship between $\nu_\lambda$ and the dependence measure, so that we can express $V_{x_1,x_2}$ as a function of $\nu_\lambda$ \begin{equation} \label{eq:Vfun2lambda-mado} V_{x_1,x_2}(\lambda, 1-\lambda) = \frac{c(\lambda) + \nu_\lambda(x_1 - x_2)}{1 - c(\lambda) -\nu_\lambda(x_1 - x_2)} \end{equation} As $V_{x_1,x_2}(0.5, 0.5) = 2 \theta(x_1 - x_2)$, the previous equation induces that the madogram and the $F$-madogram are special cases of the $\lambda$-madogram when $\lambda = 0.5$. For instance, we have \begin{equation*} \nu_{0.5}(x_1 - x_2) = \frac{8 \nu_F(x_1 - x_2)}{3\{3 + 2 \nu_F(x_1 - x_2)\}} \end{equation*} Equation~\eqref{eq:lambda-madogram} suggests a simple estimator \begin{equation} \label{eq:lambda-madogramEst} \hat{\nu}_{\lambda}(x_1 - x_2) = \frac{1}{2n} \sum_{i=1}^n |\hat{F}^{\lambda}\{z_i(x_1)\} - \hat{F}^{1-\lambda}\{z_i(x_2)\}| \end{equation} where $z_i(x_1)$ and $z_i(x_2)$ are the $i$-th observations of the random field at location $x_1$ and $x_2$, $n$ is the total number of observations and $\hat{F}$ is an estimate of the CDF at the specified location. There is an issue with the previous estimator. Indeed, the boundary conditions for the $\lambda$-madogram when $\lambda = 0$ or $\lambda = 1$ are not always fulfilled. From equation~\eqref{eq:lambda-madogram}, if $\lambda = 0$ or $\lambda = 1$, $\nu_\lambda(x_1 - x_2) = 1/4$. If $F$ is estimated by MLE, then this condition fails while if $\hat{F}(x_{i:n}) = i / (n+1)$ we get the required conditions. To bypass this hurdle, \citet{Naveau2009} propose the following adjusted estimator \begin{eqnarray*} \hat{\nu}_{\lambda}(x_1 - x_2) &=& \frac{1}{2n} \sum_{i=1}^n |\hat{F}^{\lambda}\{z_i(x_1)\} - \hat{F}^{1-\lambda}\{z_i(x_2)\}| - \frac{\lambda}{2n} \sum_{i=1}^n \left[1 - \hat{F}^\lambda\{z_i(x_1)\} \right]\\ &-& \frac{1 - \lambda}{2n} \sum_{i=1}^n \left[1 - \hat{F}^{1-\lambda}\{z_i(x_2)\}\right] + \frac{1 - \lambda + \lambda^2}{2 (2 - \lambda) (1 + \lambda)} \end{eqnarray*} By plug in this estimator into equation~\eqref{eq:Vfun2lambda-mado} we get an estimator for the dependence measure \begin{equation} \label{eq:VfunEst} \hat{V}_{x_1,x_2}(\lambda, 1-\lambda) = \frac{c(\lambda) + \hat{\nu}_\lambda(x_1 - x_2)}{1 - c(\lambda) -\hat{\nu}_\lambda(x_1 - x_2)} \end{equation} <>= n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data1 <- rmaxstab(n.obs, coord, "powexp", nugget = 0, range = 1, smooth = 1) data2 <- rmaxstab(n.obs, coord, "cauchy", nugget = 0, range = 1, smooth = 1) par(mfrow=c(1,2), pty = "s") lmadogram(data1, coord, n.bins = 60) lmadogram(data2, coord, n.bins = 60) @ \begin{figure} \centering <>= <> @ \vspace{-1cm} \caption{Binned $\lambda$-madogram estimates for two Schlather models having a powered exponential (right panel) and a Cauchy covariance (left panel) functions. $c_1 = c_2 = \nu = 1$.} \label{fig:lambda-mado} \end{figure} Figure~\ref{fig:lambda-mado} plots the binned $\lambda$-madogram estimates from $100$ replications of the Schlather model having a powered exponential and a Cauchy correlation functions. The code used to produce this figure was: <>= n.site <- 40 n.obs <- 100 coord <- matrix(runif(2 * n.site, 0, 10), ncol = 2) data1 <- rmaxstab(n.obs, coord, "powexp", nugget = 0, range = 1, smooth = 1) data2 <- rmaxstab(n.obs, coord, "cauchy", nugget = 0, range = 1, smooth = 1) par(mfrow=c(1,2), pty = "s") lmadogram(data1, coord, n.bins = 60) lmadogram(data2, coord, n.bins = 60) @ It might be useful to use the excellent \emph{persp3d} function provided by the contributed \emph{rgl} R package to explore dynamically the $\lambda$-madogram. \chapter{Fitting a Unit Fréchet Max-Stable Process to Data} \label{chat:fit-maxstab-frech} In Chapter~\ref{cha:an-introduction-max}, we described max-stable processes and gave two different characterisations of them. However, we mentioned that only the bivariate distributions are analytically known so that the fitting of max-stable processes to data is not straightforward. In this Chapter, we will present two different approaches to fitting max-stable processes to data. The first one is based on least squares and was first introduced by \citet{Smith1991}. The second one uses the maximum composite likelihood estimator \citep{Lindsay1988}, more precisely the maximum pairwise likelihood estimator. We will consider these two different approaches separately. \section{Least Squares} \label{sec:least-squares} As stated by equations~\eqref{eq:smith} and~\eqref{eq:schlather}, the density of a max-stable process is analytically known only for the bivariate case so that maximum likelihood estimators are not available. This observation led \citet{Smith1991} to propose an estimator based on least squares\index{least squares}. This fitting procedure consists in minimizing the objective function \begin{equation} \label{eq:extCoeffLeastSquares} C(\psi) = \sum_{i>= n.site <- 40 n.obs <- 80 coord <- matrix(runif(2*n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "gauss", cov11 = 9/8, cov12 = 1, cov22 = 9/8) fitcovmat(data, coord, marge = "emp") @ This approach suffers from two major drawbacks. First, unless we use Monte-Carlo techniques, standard errors are not available and because the observations are far from being normal, the least squares estimator should be far from efficiency in the way given by the Cramér-Rao lower bound \citep{Cramer1946} and the Gauss-Markov theorem. Secondly, for concrete analysis, it is hopeless that the observed (spatial) annual maxima have unit Fréchet margins so that we need first to transform the data to the unit Fréchet scale. This suggests the use of a more flexible estimator. \section{Pairwise Likelihood} \label{sec:pairwise-likelihood} As already stated, the ``full'' likelihood for the max-stable model presented in Chapter~\ref{cha:an-introduction-max} is not analytically known if the number of stations under consideration is greater or equal to three. However, as the bivariate density is analytically known, this suggests the use of the pairwise likelihood in place of the full likelihood. The log pairwise-likelihood\index{pairwise-likelihood} is given by \begin{equation} \label{eq:lplik} \ell_p(\mathbf{z};\psi) = \sum_{i>= n.obs <- 80 n.site <- 40 set.seed(12) coord <- matrix(runif(2*n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "whitmat", nugget = 0.8, range = 3, smooth = 1.2) fitmaxstab(data, coord, cov.mod = "whitmat") @ From this output, we can see that we indeed use the Schlather's representation with a Whittle--Matérn correlation function. The convergence was successful and the parameter estimates for the covariance function as well as the pairwise deviance are accessible. Large deviations from the theoretical values may be expected as the parameters of the Whittle--Matérn covariance function are far from orthogonal. Thus, the range and smooth estimates may be totally different while leading (approximately) to the same covariance function. The \verb|fitmaxstab| function provides a powerful option that can fix any parameter of the model under consideration. For instance, this could be especially useful when using the Whittle--Matérn correlation function as it is sometimes preferable to fix the smooth parameter using prior knowledge on the process smoothness \citep{Diggle2007}. Obviously, this feature is not restricted to this specific case and one can consider more exotic models. Fixing parameters of the model is illustrated by the following lines <>= fitmaxstab(data, coord, cov.mod = "whitmat", smooth = 1.2) fitmaxstab(data, coord, cov.mod = "whitmat", nugget = 0) fitmaxstab(data, coord, cov.mod = "whitmat", range = 3) @ Although the Whittle--Matérn model is flexible, one may want to consider other families of covariance functions. This is achieved by invoking: <>= fitmaxstab(data, coord, cov.mod = "cauchy") fitmaxstab(data, coord, cov.mod = "powexp") @ One may also consider the Smith model, this could be done as follows <>= fitmaxstab(data, coord, cov.mod = "gauss") @ It is also possible to used different optimization routines to fit the model to data by passing the \verb|method| argument. For instance, if one wants to use the \verb|BFGS| method: <>= fitmaxstab(data, coord, cov.mod = "gauss", cov12 = 0, method = "BFGS") @ Instead of using the \verb|optim| function, one may want to use the \verb|nlm| or \verb|nlminb| functions. This is done as before using the \verb|method = "nlm"| or \verb|method = "nlminb"| option. Finally, it is important to note that maximizing the pairwise likelihood can be expensive. The choice of the numerical optimizer is important. In particular, optimizers that use the gradient of the pairwise likelihood might be clumsy. Indeed, if the Whittle--Matérn covariance function is considered and the smooth parameter has to be estimated, then the pairwise likelihood is not differentiable with respect to this specific parameter. In general, the Nelder--Mead \citep{Nelder1965} approach seems to perform better even if the convergence is sometimes slow. \subsection{Assessing Uncertainties} \label{sec:assess-uncert} Recall that the maximum pairwise likelihood estimator $\hat{\psi}_p$ satisfies \begin{equation*} \hat{\psi}_p \stackrel{\cdot}{\sim} \mathcal{N}\left(\psi, H(\psi)^{-1} J(\psi) H(\psi)^{-1} \right), \qquad n \rightarrow +\infty, \end{equation*} where $H(\psi) = \mathbb{E}[\nabla^2 \ell_p(\psi;\mathbf{Y})]$ (the Hessian matrix) and $J(\psi) = \mbox{Var}[\nabla \ell_p(\psi;\mathbf{Y})]$, where the expectations are with respect to the ``full'' density. In practice, to get the standard errors\index{standard errors} we need estimates of $H(\psi)$ and $J(\psi)$. The estimation of $H(\psi)$ is straightforward and is $\hat{H}(\hat{\psi}_p) = \nabla^2 \ell_p(\hat{\psi}_p;\mathbf{y})$; that is, the Hessian matrix evaluated at $\hat{\psi}_p$. Usually, standard optimizers are able to get finite-difference based estimates for $H(\hat{\psi}_p)$ so that no extra work is needed to get $\hat{H}(\hat{\psi}_p)$. The estimation of $J(\hat{\psi}_p)$ is a little bit more difficult and can be done in two different ways \citep{Varin2005}. First note that $J(\hat{\psi}_p)$ could be estimated using the ``naive'' estimator $\hat{J}(\hat{\psi}_p) = \nabla \ell_p(\hat{\psi}_p;\mathbf{y}) {\nabla \ell_p(\hat{\psi}_p;\mathbf{y})}^T$. This is unsatisfactory as most often the gradient of the log composite likelihood vanishes. Top bypass this hurdle, $J(\hat{\psi}_p)$ can be estimated by \begin{equation} \label{eq:Jest} \hat{J}(\hat{\psi}_p) = \sum_{i=1}^n \nabla \ell_p(\hat{\psi}_p;\mathbf{y}_i) {\nabla \ell_p(\hat{\psi}_p;\mathbf{y}_i)}^T \end{equation} Another estimator is given by noticing that $J(\psi)$ corresponds to the variance of the pairwise score equation\index{score equation} $\ell_p(\psi;\mathbf{Y}) = 0$. The latter estimator is used to get standard errors. These two estimators are only accessible if independent replications of $\mathbf{Y}$ are available\footnote{which will mostly be the case for spatial extremes.}. \chapter{Model Selection} \label{cha:model-selection} Model selection plays an important role in statistical modelling. According to Ockham's razor, given several models that fit the data equally well, we should focus on simple models rather than more complex ones. Depending on the models to be compared, several approaches exist for model selection. In this section, we will present theory on information criteria such as the Akaike Information Criterion (AIC) \citep{Akaike1974} and the likelihood ratio statistic \citep[Sec.~4.5]{Davison2003}. We present these two approaches in turn. \section{Takeuchi Information Criterion} \label{sec:take-inform-crit} Having two different models, we want to know which one we should prefer for modelling our data. If two models have exactly the same maximized log-likelihoods, we should prefer the one which has fewer parameters because it will have a smaller variance. However, if these two maximized log-likelihoods only differ by a small amount, does this small increase worth the price of having additional parameters? To answer this question, we resort to the Kullback--Leibler discrepancy. Let consider a random sample $Y_1$, \ldots, $Y_n$ drawn from an unknown density $g$. Ignoring $g$, we fit a statistical model $f(y;\psi)$ by maximizing the log-likelihood. The Kullback--Leibler discrepancy measures the discrepancy of our fitted model $f$ from the true one $g$ \begin{equation} \label{eq:KullbackLeibler} D\left(f_\psi, g\right) = \int \log \left( \frac{g(y)}{f(y;\psi)} \right) g(y) \mbox{dy} \end{equation} The Kullback--Leibler discrepancy is always positive. Indeed, as $x \mapsto - \log(x)$ is a convex function, Jensen's inequality states \begin{equation*} D\left(f_\psi, g\right) = \int \log \left( \frac{g(y)}{f(y;\psi)} \right) g(y) \mbox{dy} \geq -\log \left( \int \frac{f(y;\psi)}{g(y)} g(y) \mbox{dy} \right) = 0 \end{equation*} where we used the fact that $f(y;\psi)$ is a probability density function. Furthermore, the lower bound is reached if and only if the convex function is not strictly convex which only occurs iff $f(y;\psi) = g(y)$. Consequently, for model selection, we aim to choose models that minimize $D(f_\psi, g)$. However, $D(f_\psi, g)$ is not enough discriminant as several models may satisfy $D(f_\psi, g) = 0$. To solve this issue, suppose we have an estimate $\hat{\psi}$, we need to average $D(f_{\hat{\psi}}, g)$ over the distribution of $\hat{\psi}$. Intuitively, because of their larger sampling variance, averaging over $\hat{\psi}$ will penalized much more models having a larger number of parameters than those with fewer parameters. Let $\psi_g$ be the value that minimizes $D(f_\psi, g)$. A Taylor expansion of $\log f(y;\hat{\psi})$ around $\psi_g$ gives \begin{equation*} \log f(y;\hat{\psi}) \approx \log f(y;\psi_g) + \frac{1}{2} \left(\hat{\psi} - \psi_g \right)^T \frac{\partial^2 \log f(y;\psi_g)}{\partial \psi \partial \psi^T} \left(\hat{\psi} - \psi_g \right) \end{equation*} where we use the fact that $\psi_g$ minimise the Kullback--Leibler discrepancy so its partial derivative with respect to $\psi$ vanishes. By putting this into equation~\eqref{eq:KullbackLeibler} we get \begin{eqnarray*} D\left(f_{\hat{\psi}},g\right) &\stackrel{\cdot}{=}& D\left(f_{\psi_g}, g \right) - \frac{1}{2 n} \mbox{tr}\left\{(\hat{\psi} - \psi_g) (\hat{\psi} - \psi_g)^T J(\psi_g)\right\} \end{eqnarray*} where $J(\psi_g)$ is given by equation~\eqref{eq:VarScore}. Finaly, we have \begin{equation} \label{eq:KLAverageOverPsiHat} n \mathbb{E}_g \left[D\left(f_{\hat{\psi}},g\right) \right] \stackrel{\cdot}{=} n D\left(f_{\psi_g},g\right) - \frac{1}{2} \mbox{tr}\left\{J(\psi_g)^{-1} H(\psi_g) \right\} \end{equation} where $H(\psi_g)$ is given by equation~\eqref{eq:Hessian}. The AIC\index{information criterion!AIC} is, up to constant, an estimator of equation~\eqref{eq:KLAverageOverPsiHat} and is defined as \begin{equation} \label{eq:AIC} \mbox{AIC} = - 2 \ell(\hat{\psi}) + 2 p \end{equation} The simplification $\mbox{tr}\{J(\psi_g)^{-1} H(\psi_g) \} = -p$ arises as the AIC supposes that there is no misspecification. Because we see in Section~\ref{sec:misspecification} that by using the maximum pairwise likelihood estimator we work under misspecification, the AIC is not appropriate. Another estimator of equation~\eqref{eq:KLAverageOverPsiHat}, which allows for misspecification, is the Takeuchi information criterion\index{information criterion!TIC} (TIC) \begin{equation} \label{eq:TIC} \mbox{TIC} = - 2 \ell(\hat{\psi}) - 2 \mbox{tr}\left\{\hat{J} \hat{H}^{-1} \right\} \end{equation} In accordance with the AIC, the best model will correspond to the one that minimizes equation~\eqref{eq:TIC}. Recently, \citet{Varin2005} rediscover this information criterion and demonstrate its use for model selection when composite likelihood is involved. In practice, one can have a look at the output of the \verb|fitmaxstab| function or use the \verb|TIC| function. <>= set.seed(1) @ <<>>= n.obs <- 80 n.site <- 40 coord <- matrix(runif(2*n.site, 0, 10), ncol = 2) data <- rmaxstab(n.obs, coord, "cauchy", nugget = 0.2, range = 3, smooth = 1.2) M0 <- fitmaxstab(data, coord, cov.mod = "powexp") M1 <- fitmaxstab(data, coord, cov.mod = "cauchy") TIC(M0, M1) @ The TIC for \verb|M1| is lower that the one for \verb|M0| so that we might prefer using \verb|M1|. \section{Likelihood Ratio Statistic} \label{sec:likel-ratio-stat} TIC is useful when comparing different models. When dealing with nested models, one may prefer using the likelihood ratio statistic because of the Neyman--Pearson lemma \citep{Neyman1933}. Suppose we are interested in a statistical model $\{f(x;\psi)\}$ where $\psi^T = (\kappa^T, \phi^T)$ and more especially wether a particular value $\kappa_0$ of $\kappa$ is relevant with our data. Let $(\hat{\kappa}^T, \hat{\phi}^T)$ be the maximum likelihood estimate for $\psi$ and $\hat{\phi}_{\kappa_0}$ the maximum likelihood estimate under the restriction $\kappa = \kappa_0$. A common statistic to check if $\kappa_0$ is relevant or not is the likelihood ratio statistic\index{likelihood ratio statistic} $ W(\kappa_0)$ which satisfies, under mild regularity conditions, \begin{equation} \label{eq:likRatio} W(\kappa_0) = 2 \left\{\ell(\hat{\kappa},\hat{\phi}) - \ell(\kappa_0, \hat{\phi}_{\kappa_0}) \right\} \longrightarrow \chi_p^2, \qquad n \rightarrow +\infty \end{equation} where $p$ is the dimension of $\kappa_0$. Unfortunately, when working under misspecification, the usual asymptotic $\chi^2_p$ distribution does not hold anymore. There's two ways to solve this issue: (a) adjusting the $\chi^2$ distribution \citep{Rotnitzki1990} or (b) adjusting the composite likelihood so that the usual $\chi^2_p$ holds \citep{Chandler2007}. We will consider these two strategies in turn. \subsection{Adjusting the $\chi^2$ distribution} \label{sec:adjust-chi2-distr} If the model is misspecified, equation~\eqref{eq:likRatio} has to be adjusted. More precisely, as stated by \citep{Kent1982}, we have \begin{equation} \label{eq:likRatioMissp} W(\kappa_0) = 2 \left\{\ell(\hat{\kappa},\hat{\phi}) - \ell(\kappa_0, \hat{\phi}_{\kappa_0}) \right\} \longrightarrow \sum_{i=1}^p \lambda_i X_i, \qquad n \rightarrow +\infty \end{equation} where $\lambda_i$ are the eigenvalues of ($(H^{-1} J H^{-1})_\kappa \{-(H^{-1})_\kappa\}^{-1}$, the $X_i$ are independent $\chi_1^2$ random variables and $(H^{-1} J H^{-1})_\kappa$ and $(H^{-1})_\kappa$ are the submatrices of $H^{-1} J H^{-1}$ and $H^{-1}$ corresponding to the elements of $\kappa$ and where the matrices $H$ and $J$ are given by equations~\eqref{eq:Hessian} and \eqref{eq:VarScore}. For practical purposes, the matrices $H$ and $J$ are substituted for their respective estimates as described in Section~\ref{sec:assess-uncert}. The problem with equation~\eqref{eq:likRatioMissp} is that generally the distribution of $\sum_{i=1}^p \lambda_i X_i$ is not known exactly. This led \citet{Rotnitzki1990} to consider $p W(\kappa_0) / \sum_{i=1}^p \lambda_i$ as a $\chi_p^2$ random variable. However, a better approximation uses results on quadratic forms of normal random distribution. An application of this approach is given by the following lines: <>= set.seed(7) @ <<>>= n.obs <- 50 n.site <- 30 coord <- matrix(rnorm(2*n.site, sd = sqrt(.2)), ncol = 2) data <- rmaxstab(n.obs, coord, "gauss", cov11 = 100, cov12 = 25, cov22 = 220) M0 <- fitmaxstab(data, coord, "gauss", cov11 = 100) M1 <- fitmaxstab(data, coord, "gauss") anova(M0, M1) @ From this ouput, we can see that the $p$-value is approximately $0.875$ which turns out to be in favour of $H_0$ i.e. $\sigma_{11} = 100$ in $\Sigma$. Note that the eigenvalue estimate was also reported. \subsection{Adjusting the composite likelihood} \label{sec:adjust-comp-likel} Contrary to the approach of \citet{Rotnitzki1990}, \citet{Chandler2007} propose to adjust the composite likelihood instead of adjusting the asymptotic likelihood ratio statistic distribution. The starting point is that, under misspecification, the log-composite likelihood evaluated at its maximum likelihood estimate $\hat{\psi}$ has curvature $-\hat{H}^{-1}$ while it should be $\hat{H}^{-1} \hat{J} \hat{H}^{-1}$. This forms the basis for adjusting the log-composite likelihood is the following way, \begin{equation} \label{eq:lclik} \ell_A(\psi) = \ell(\psi^*), \qquad \psi^* = \hat{\psi} + C (\psi - \hat{\psi}) \end{equation} for some $p\times p$ matrix $C$. It is straightforward to see that $\hat{\psi}$ maximizes $\ell_A$ with zero derivative. The key point is that its curvature at $\hat{\theta}$ is $C^T \hat{H}^{-1} C$. By choosing an appropriate $C$ matrix, it is possible to ensure that $\ell_A$ has the right curvature for applying~\eqref{eq:likRatio} directly. More precisely, by letting \begin{equation} \label{eq:adjustMatrix} C = M^{-1} M_A \end{equation} where $M^T M = \hat{H}$ and $M_A^T M_A = \hat{H}^{-1} \hat{J} \hat{H}^{-1}$, we ensure that $\ell_A$ has curvature $\hat{H}^{-1} \hat{J} \hat{H}^{-1}$. If $p>1$, the matrix square roots $M$ and $M_A$ are not unique and one may use the Cholesky or the singular value decompostions. With this setting, we can apply~\eqref{eq:likRatio} directly i.e. \begin{equation*} W_A(\kappa_0) = 2 \left\{\ell_A(\hat{\kappa}, \hat{\phi}) - \ell_A(\kappa_0, \hat{\phi}_A) \right\} \longrightarrow \chi_p^2 \end{equation*} where $\hat{\phi}_A$ is the maximum adjusted likelihood estimated for the restricted model. The problem with the above equation is that it requires the estimation of $\hat{\phi}_A$ which could be CPU prohibitive. Unfortunately, substituting $\hat{\phi}$ for $\hat{\phi}_A$ doesn't solve the problem as $\ell_A(\hat{\phi}) \leq \ell_A(\hat{\phi}_A)$ so that this substitution will inflate $W_A(\kappa_0)$ and thus $\Pr_{H_0}[W_a(\kappa_0) > w_a(\alpha)] > 1 - \alpha$, where $w_a(\alpha)$ is the $1-\alpha$ quantile for the $\chi_p^2$ distribution and $\alpha$ the significance level of the hypothesis test. To solve these problems, \citet{Chandler2007} propose to compensate for the use of $\hat{\phi}$ instead of $\hat{\phi}_A$ by considering a modified likelihood ratio statistic \begin{equation} \label{eq:8} W_A^*(\kappa_0) = 2 c \left\{\ell_A(\hat{\kappa}, \hat{\phi}) - \ell_A(\kappa_0, \hat{\phi}_A) \right\} \longrightarrow \chi_p^2 \end{equation} where \begin{equation*} c = \frac{(\hat{\kappa} - \kappa_0)^T \{-(\hat{H}^{-1} \hat{J} \hat{H}^{-1})_{\kappa_0}\}^{-1} (\hat{\kappa} - \kappa_0)}{\{(\hat{\kappa}^T, \hat{\phi}^T)\}^T \{-(\hat{H}^{-1} \hat{J} \hat{H}^{-1})_{\kappa_0}\} \{(\hat{\kappa}^T, \hat{\phi}^T)\}} \end{equation*} The next lines performs the same hypothesis test as that in Section~\ref{sec:adjust-chi2-distr}. <<>>= anova(M0, M1, method = "CB") @ By using the Chandler and Bate methodology, we draw the same conclusion as in the previous section, i.e.\ the observations are consistent with the null hypothesis $\sigma_{11} = 100$. \chapter[Fitting a Max-Stable Process to Data]{Fitting a Max-Stable Process to Data With Unknown GEV Margins} \label{cha:fit-maxstab-gev} In practice, the observations will never be drawn from a unit Fréchet distribution so that Chapter~\ref{chat:fit-maxstab-frech} won't help much with concrete applications. One way to avoid this problem is to fit a GEV to each location and then transform all data to the unit Fréchet scale. Given a continuous random variable $Y$ whose cumulative distribution function is $F$, one can define a new random variable $Z$ such as $Z$ is unit Fréchet distributed \begin{equation} \label{eq:Y2Frech} Z = - \frac{1}{\log F(Y)} \end{equation} More precisely, if $Y$ is a random variable distributed as a GEV with location, scale and shape parameters equal to $\mu$, $\sigma$ and $\xi$ respectively, it turns out that equation~\eqref{eq:GEV2Frech} becomes \begin{equation} \label{eq:GEV2Frech} Z = \left(1 + \xi \frac{Y - \mu}{\sigma}\right)^{1/\xi} \end{equation} The above transformation can be done by using the \verb|gev2frech| function <<>>= x <- c(2.2975896, 1.6448808, 1.3323833, -0.4464904, 2.2737603, -0.2581876, 9.5184398, -0.5899699, 0.4974283, -0.8152157) z <- gev2frech(x, 1, 2, .2) @ or conversely if $Z$ is a unit Fréchet random variable, then the random variable $Y$ defined as \begin{equation} \label{eq:frech2GEV} Y = \mu + \sigma \frac{Z^{\xi} - 1}{\xi} \end{equation} is GEV distributed with location, scale and shape parameters equal to $\mu \in \mathbb{R}$, $\sigma \in \mathbb{R}_*^+$ and $\xi \in \mathbb{R}$ respectively. <<>>= frech2gev(z, 1, 2, .2) @ The drawback of this approach is that standard errors are incorrect as the margins are fitted separately from the spatial dependence structure. Consequently, the standard errors related to the spatial dependence parameters are underestimated as we suppose that data were originally unit Fréchet. One can solve this problem by fitting in \emph{one step} both GEV and spatial dependence parameters \citep{Padoan2008a,Padoan2008,Gholam2009}. As the bivariate distributions for the max-stable models introduced in Chapter~\ref{cha:an-introduction-max} were imposing unit Fréchet margins, we need to rewrite them for unknown GEV margins. To this aim, let define the transformation $t$ such that \begin{equation} \label{eq:mapGEV2Frech} t: Y(x) \mapsto \left(1 + \xi(x) \frac{Y(x) - \mu(x)}{\sigma(x)}\right)^{1/\xi(x)} \end{equation} where $Y(\cdot)$ is supposed to be a max-stable random field having GEV margins such that $Y(x) \sim \mbox{GEV}(\mu(x), \sigma(x), \xi(x))$, $\sigma(x) >0$ for all $x \in \mathbb{R}^d$. Consequently, the bivariate distribution of $(Y(x_1), Y(x_2))$ is \begin{equation*} \Pr\left[Y(x_1) \leq y_1, Y(x_2) \leq y_2 \right] = \Pr\left[Z(x_1) \leq z_2, Z(x_2) \leq z_2 \right] \end{equation*} where $z_1 = t(y_1)$ and $z_2 = t(y_2)$. Thus, one can relate the bivariate density for $(Y(x_1), Y(x_2))$ to the one for $(Z(x_1), Z(x_2))$ that we introduced in Chapter~\ref{cha:an-introduction-max} and the log pairwise likelihood becomes \begin{equation} \label{eq:lplikGEV} \ell_p(\mathbf{y};\psi) = \sum_{i>= fitmaxstab(data, coord, "gauss", fit.marge = TRUE) @ However, this will be really time consuming as such models will have $3 n.site + p$ parameters to estimate, where $p$ is the number of parameters related to the extremal spatial dependence structure. Another drawback is that prediction at unobserved locations won't be possible. Indeed, if no model is assumed for the evolution of the GEV parameters in space, it is therefore impossible to predict them where no data is available. Another way may be to fit \emph{response surfaces} for the GEV parameters. The next section aims to give an introduction to the use of response surfaces. \section{Response Surfaces} \label{sec:response-surfaces} Response surfaces is a generic term when the problem under concern is to describe how a \emph{response variable} $y$ depends on \emph{explanatory variables} $x_1$, \ldots, $x_k$. For instance, with our particular problem of spatial extremes, one may wonder how is it possible to predict the GEV parameters at a fixed location given the knowledge of extra covariables such as longitude, latitude, \ldots The goal of response surfaces is to get efficient predictions for the response variable while keeping, so far as we can, simple models. In this section, we will first introduce the linear regression models. Next, we will increase in complexity and flexibility by introducing semiparametric regression models. \subsection{Linear Regression Models} \label{sec:line-regr-models} Suppose we observe a response $y$ through the $y_1$, \ldots, $y_n$ values. For each observed $y_i$, we also have $p$ related explanatory variables denoted by $x_{1,i}$, \ldots, $x_{p,i}$. To predict $y$ given the $x_{\cdot, \cdot}$ values, one might consider the following model: \begin{equation*} y_i = \beta_0 + \beta_1 x_{1,i} + \cdots + \beta_p x_{p,i} + \epsilon_i \end{equation*} where $\beta_0$, \ldots, $\beta_p$ are the regression parameters to be estimated and $\epsilon_i$ is an unobserved error term. It is possible to write the above equation in a more compact way by using matrix notation e.g. \begin{equation} \label{eq:LM} \mathbf{y} = \mathbf{X \beta} + \mathbf{\epsilon} \end{equation} where $\mathbf{y}$ is a $n \times 1$ vector, $\mathbf{X}$ is a $n \times p$ matrix called the \emph{design matrix}\index{design matrix} and $\mathbf{\epsilon}$ is a $p \times 1$ vector. Model~\eqref{eq:LM} is called a \emph{linear model} \index{linear model} as it is linear in $\beta$ but not necessarily in the covariates $x$. For example, the two following models are linear models \begin{eqnarray*} y &=& \beta_0 + \beta_1 x_1 + \beta_2 x_1^2 + \epsilon\\ y &=& \beta_0 + \beta_1 x_1 + \beta_2 \log x_2 + \epsilon \end{eqnarray*} or equivalently in a matrix notation \begin{eqnarray*} \begin{bmatrix} y_1\\ \vdots\\ y_n \end{bmatrix} =& \begin{bmatrix} 1 & x_{1,1} & x_{1,1}^2\\ \vdots & \vdots & \vdots\\ 1 & x_{1,n} & x_{1,n}^2 \end{bmatrix} & \begin{bmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{bmatrix} + \begin{bmatrix} \epsilon_0\\ \epsilon_1\\ \epsilon_2 \end{bmatrix}\\ \begin{bmatrix} y_1\\ \vdots\\ y_n \end{bmatrix} =& \begin{bmatrix} 1 & x_{1,1} & \log x_{2,1}\\ \vdots & \vdots & \vdots\\ 1 & x_{1,n} & \log x_{2,n} \end{bmatrix} & \begin{bmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{bmatrix} + \begin{bmatrix} \epsilon_0\\ \epsilon_1\\ \epsilon_2 \end{bmatrix} \end{eqnarray*} Usually, $\mathbf{\beta}$ is estimated by minimizing least squares\index{least squares} \begin{equation} \label{eq:least-square} SS(\beta) = \sum_{i=1}^n (y_i - x_i^T \mathbf{\beta})^2 = (\mathbf{y} - \mathbf{X\beta})^T (\mathbf{y} - \mathbf{X\beta}) \end{equation} which amounts to solve the equations \begin{equation*} \sum_{i=1}^n x_{j,i} (y_i - \mathbf{\beta}^T x_i) = 0, \qquad j = 1, \ldots, p \end{equation*} or equivalently in a matrix notation \begin{equation*} \mathbf{X}^T (\mathbf{y} -\mathbf{X\beta}) = 0 \end{equation*} so that, provided that $\mathbf{X}^T \mathbf{X}$ is invertible, the least squares estimate for $\mathbf{\beta}$ is given by \begin{equation} \label{eq:betahatLeastSquare} \hat{\mathbf{\beta}} = \left(\mathbf{X}^T \mathbf{X} \right)^{-1} \mathbf{X}^T \mathbf{y} \end{equation} and the fitted $\mathbf{y}$ values are given by \begin{equation} \label{eq:hatMatrix} \hat{\mathbf{y}} = \mathbf{X} \hat{\mathbf{\beta}} = \mathbf{X} \left(\mathbf{X}^T \mathbf{X} \right)^{-1} \mathbf{X}^T \mathbf{y} \end{equation} The matrix $\mathbf{H} = \mathbf{X} \left(\mathbf{X}^T \mathbf{X} \right)^{-1} \mathbf{X}^T$ is called the \emph{hat matrix} \index{hat matrix} as it puts ``hats'' on $\mathbf{y}$. $\mathbf{H}$ is a projection matrix that orthogonally projects $\mathbf{y}$ onto the plane spanned by the columns of the design matrix $\mathbf{X}$. The hat matrix plays an important role in parametric regression as it provides useful informations on the influence of some observations to the fitted values. Indeed, from equation~\eqref{eq:hatMatrix}, we have \begin{equation*} \hat{y}_i = \sum_{j=1}^n H_{i,j} y_j \end{equation*} so that $H_{i,i}$ is the contribution of $y_i$ to the estimate $\hat{y}_i$. Furthermore, if we consider the total influence of all the observations, we have \begin{eqnarray*} \sum_{i=1}^n H_{i,i} &=& \mbox{tr}(\mathbf{H}) = \mbox{tr}\{\mathbf{X} (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \}\\ &=& \mbox{tr}\{\mathbf{X}^T \mathbf{X} (\mathbf{X^T} \mathbf{X})^{-1} \} = \mbox{tr} (\mathbf{I}_p) = p \end{eqnarray*} and the total influence of all observations is equal to the degrees of freedom\index{degrees of freedom} of the model. \subsection{Semiparametric Regression Models} \label{sec:semipar-regr} In the previous section, we talked about linear regression models for which the relationship between the explanatory variables and the response has a deterministic shape and is supposed to be known. However, it may happened applications for which the data have a complex behaviour. For such cases, we benefit from using semiparametric regression models defined as \begin{equation} \label{eq:semiparModel} y_i = f(x_i) + \epsilon_i \end{equation} where $f$ is a smooth function with unknown shape. The idea of semiparametric regression models is to decompose $f$ into an appropriate basis for which equation~\eqref{eq:semiparModel} simplifies to equation~\eqref{eq:LM} e.g. \begin{equation} \label{eq:decompBasis} f(x) = \sum_{j=1}^q b_j(x) \beta_j \end{equation} where $b_j(\cdot)$ is the $j$-th basis function and $\beta_j$ is the $j$-th element of the regression parameter $\mathbf{\beta}$. Several basis functions exist such as the polynomial basis, the cubic spline basis, B-splines, \ldots It is beyond the scope of this document to introduce all of them in details but the interested reader should have a look at \citet{Ruppert2003}. Some details about the basis implemented in the package are reported in Annex~\ref{cha:p-splines-with} Usually, the basis functions $b_j(\cdot)$ depends on \emph{knots} $\kappa$ so that equation~\eqref{eq:decompBasis} becomes \begin{equation} \label{eq:decompBasis2} f(x) = \beta_0 + \beta_1 x + \sum_{j=1}^q b_j(x - \kappa_j) \end{equation} <>= set.seed(12) x <- runif(100) fun <- function(x) sin(3 * pi * x) y <- fun(x) + rnorm(100, 0, 0.15) knots1 <- quantile(x, prob = 1:2 / 3) knots2 <- quantile(x, prob = 1:10 / 11) knots3 <- quantile(x, prob = 1:50 / 51) M0 <- rbpspline(y, x, knots = knots1, degree = 3, penalty = 0) M1 <- rbpspline(y, x, knots = knots2, degree = 3, penalty = 0) M2 <- rbpspline(y, x, knots = knots3, degree = 3, penalty = 0) par(mfrow=c(1,3)) plot(x, y, col = "lightgrey") rug(knots1) lines(M0) plot(x, y, col = "lightgrey") rug(knots2) lines(M1, col = 2) plot(x, y, col = "lightgrey") rug(knots3) lines(M2, col = 3) @ \begin{figure} \centering <>= <> @ \caption{Impact of the number of knots in the fitted $p$-spline. Left panel: $q$ = 2, middle panel: $q = 10$, right panel: $q=50$. The small vertical lines corresponds to the location of each knot.} \label{fig:knotsEffect} \end{figure} The problem with model~\eqref{eq:decompBasis2} is that it is strongly affected by the number of knots. Figure~\ref{fig:knotsEffect} depicts this problem by fitting the same dataset to model~\eqref{eq:decompBasis2} with $q=2, 10$ and $50$. Clearly, the first fit is not satisfactory and we need to increase the number of knots. The second one seems plausible while the last one clearly overfits. This figure was generated with the following lines <>= <> @ Consequently, there is a pressing need for a kind of ``automatic knot selection''. One common strategy to overcome this issue is to resort to \emph{penalized splines} or \emph{p-splines}\index{p-splines}. The idea beyond this is to consider a large number of knots but to constrain, in a sense to be defined, their influence. <>= M0 <- rbpspline(y, x, knots = knots3, degree = 3, penalty = 0) M1 <- rbpspline(y, x, knots = knots3, degree = 3, penalty = 0.1) M2 <- rbpspline(y, x, knots = knots3, degree = 3, penalty = 10) par(mfrow=c(1,3)) plot(x, y, col = "lightgrey") lines(M0) plot(x, y, col = "lightgrey") lines(M1, col = 2) plot(x, y, col = "lightgrey") lines(M2, col = 3) @ \begin{figure} \centering <>= <> @ \caption{Impact of the smoothing parameter $\lambda$ on the fit. Left panel: $\lambda = 0$, middle panel: $\lambda = 0.1$ and right panel: $\lambda = 10$.} \label{fig:smoothingParamEffect} \end{figure} To avoid overfitting, one wish to minimize the sum of square subject to some constraint on the $\mathbf{\beta}$ parameter i.e. \begin{equation*} \text{minimize } ||\mathbf{y} - \mathbf{X\beta}||^2 \quad \text{subject to } \mathbf{\beta}^T \mathbf{K \beta} \leq C \end{equation*} for a judicious choice of $C$ and a given matrix $\mathbf{K}$. Annex~\ref{sec:model-definition} gives further details for one possible choice of $\mathbf{K}$. Using a Lagrange multiplier argument, this constraint optimization problem is equivalent to choosing $\mathbf{\beta}$ to minimize \begin{equation} \label{eq:constraintLeastSquare} ||\mathbf{y} - \mathbf{X \beta}||^2 + \lambda \mathbf{\beta}^T \mathbf{K \beta} \end{equation} for some $\lambda\geq 0$ called the \emph{smoothing parameter}\index{smoothing parameter} as it controls the amount of smoothing. Indeed, if $\lambda =0$, then problem~\eqref{eq:constraintLeastSquare} is left unconstrained and leads to wiggly fits, while $\lambda$ being large implies smoother fits. Figure~\ref{fig:smoothingParamEffect} is a nice illustration of the impact of the smoothing parameter on the smoothness of the fitted curve. It was generated using the following code <>= <> @ It can be shown that problem~\eqref{eq:constraintLeastSquare} has the solution \begin{equation} \label{eq:beta.hatPspline} \hat{\mathbf{\beta}}_\lambda = (\mathbf{X}^T \mathbf{X} + \lambda \mathbf{K})^{-1} \mathbf{X}^T \mathbf{y} \end{equation} and the corresponding fitted values for a penalized spline are given by \begin{equation} \label{eq:fitted.valuesPspline} \hat{\mathbf{y}} = \mathbf{X} (\mathbf{X}^T \mathbf{X} + \lambda \mathbf{K})^{-1} \mathbf{X}^T \mathbf{y} \end{equation} In accordance with the hat matrix with linear models, one can define the \emph{smoother matrix}\index{smoother matrix} $\mathbf{S}_\lambda$ such that \begin{equation} \label{eq:smootherMatrix} \hat{\mathbf{y}} = \mathbf{S}_\lambda \mathbf{y} \end{equation} where $\mathbf{S}_\lambda = \mathbf{X} (\mathbf{X}^T \mathbf{X} + \lambda \mathbf{K})^{-1} \mathbf{X}^T$. Consequently, a kind of effective degrees of freedom\index{degrees of freedom} is given by $\mbox{tr}(\mathbf{S}_\lambda)$. If the problem of knot selection seems to be resolved by using these constrained least squares minimisation, there is still some open questions: given our data and knots, what is the best value for $\lambda$? Would it be possible to get an ``automatic selection'' for $\lambda$? One common tool for answering these two questions is known as \emph{cross-validation}\index{cross-validation} (CV) \begin{equation} \label{eq:crossValidation} CV(\lambda) = \sum_{i=1}^n \{y_i - \hat{f}_{-i}(x_i;\lambda)\}^2 \end{equation} where $\hat{f}_{-i}$ corresponds to the semiparametric estimator applied to the data but with $(x_i, y_i)$ omitted. Intuitively, large values of $CV(\lambda)$ corresponds to models that are wiggly and/or have a large variance in the parameter estimates so that minimising $CV(\lambda)$ is a nice option for an ``automatic selection'' of $\lambda$. Unfortunately, the computation of equation~\eqref{eq:crossValidation} directly is often too CPU demanding. However, it can be shown \citep{Ruppert2003} that \begin{equation} \label{eq:crossValidationInPractice} CV(\lambda) = \sum_{i=1}^n \left(\frac{y_i - \hat{y}_i}{1 - S_{\lambda,ii}}\right)^2 \end{equation} where $S_{\lambda,ii}$ is the $(i,i)$ element of $\mathbf{S}_\lambda$. Clearly, equation~\eqref{eq:crossValidationInPractice} does a better job than equation~\eqref{eq:crossValidation} as it only requires one fit to compute $CV(\lambda)$. Sometimes, the weights $1 - S_{\lambda,ii}$ are replaced by the mean weight, $\mbox{tr}(\mathbf{Id} - \mathbf{S}_\lambda) / n$, where $\mathbf{Id}$ is the identity matrix, leading to the \emph{generalized cross-validation}\index{cross-validation!generalized} (GCV) score \begin{equation} \label{eq:generalizedCrossValidationInPractice} GCV(\lambda) = n^2 \sum_{i=1}^n \left(\frac{y_i - \hat{y}_i}{\mbox{tr}(\mathbf{Id} - \mathbf{S}_\lambda)} \right)^2 \end{equation} GCV has computational advantages over CV, and it has also computational advantages in term of invariance \citep{Wood2006}. <>= par(mfrow=c(1,3)) lambda.cv <- cv(y, x, knots = knots3, degree = 3)$penalty abline(v = lambda.cv, lty = 2) lambda.gcv <- gcv(y, x, knots = knots3, degree = 3)$penalty abline(v = lambda.gcv, lty = 2) cv.fit <- rbpspline(y, x, knots3, degree = 3, penalty = "cv") gcv.fit <- rbpspline(y, x, knots3, degree = 3, penalty = "gcv") plot(x, y, col = "lightgrey") lines(cv.fit, col = 2) lines(gcv.fit, col = 3) @ \begin{figure} \centering <>= <> @ \caption{Cross-validation and generalized cross validation curves and corresponding fitted curves.} \label{fig:CVandGCV} \end{figure} Figure~\ref{fig:CVandGCV} plots the CV and GCV curves for the data plotted in Figure~\ref{fig:smoothingParamEffect} and the corresponding fitted p-spline. The selection of $\lambda$ using CV or GCV yield approximately to the same smoothing parameter value. These ``best'' $\lambda$ values are in accordance with the values we held fixed in Figure~\ref{fig:smoothingParamEffect}. The fitted curves using eigher CV or GCV lead to indistinguishable curves. The code used to generate Figure~\ref{fig:CVandGCV} was <>= <> @ \section{Building Response Surfaces for the GEV Parameters} \label{sec:build-resp-surf} In the previous section, we introduced the notion of response surfaces and we show that they should be used if one is interested in simultaneously fitting the GEV and the spatial dependence parameters of a max-stable process. However, one may wonder how to build accurate response surfaces for the GEV parameters. This is the aim of this section. A first attempt could be to fit several max-stable models and identify the most promising ones by using the techniques on model selection introduced in Chapter~\ref{cha:model-selection}. Although it is a legitimate approach, its use in practice is limited because the fitting procedure, due to the pairwise likelihood estimator, is CPU prohibitive. A more pragmatic strategy is to consider only these response surfaces while omitting temporally the spatial dependence parameters. Although this strategy doesn't take into account all the uncertainties on the max-stable parameters, it should lead to accurate model selection as one expects the spatial dependence parameters and the GEV response surface parameters to be nearly orthogonal. The main asset of the latter approach is that fitting a (kind of) spatial GEV model to data is less CPU consuming. This spatial GEV model is defined as follows: \begin{equation} \label{eq:spatgev} Z(x) \sim GEV\left(\mu(x), \sigma(x), \xi(x)\right) \end{equation} where the GEV parameters are defined through the following equations \begin{equation*} \mu = X_\mu \beta_\mu, \qquad \sigma = X_\sigma \beta_\sigma, \qquad \xi = X_\xi \beta_\xi \end{equation*} where $X_\cdot$ are design matrices and $\beta_\cdot$ are parameters to be estimated. The log-likelihood of the spatial GEV model is \begin{equation} \label{eq:llikSpatGEV} \ell(\beta) = \sum_{i=1}^{n.site} \sum_{j=1}^{n.obs} \left\{ - \log \sigma_i - \left(1 + \xi_i \frac{z_{i,j} - \mu_i}{\sigma_i} \right)^{-1/\xi_i} - \left(1 + \frac{1}{\xi_i} \right) \log \left(1 + \xi_i \frac{z_{i,j} - \mu_i}{\sigma_i} \right) \right\} \end{equation} where $\beta = (\beta_\mu, \beta_\sigma, \beta_\xi)$, $\mu_i$, $\sigma_i$ and $\xi_i$ are the GEV parameters for the $i$-th site and $z_{i,j}$ is the $j$-th obvservation for the $i$-th site. From equation~\eqref{eq:llikSpatGEV}, we can see that independence between stations is assumed. For most applications, this assumption is clearly incorrect and we require the use of the MLE asymptotic distribution under misspecification to get standard error estimates: \begin{equation} \label{eq:spatGEVStdErr} \left(\beta_\mu, \beta_\sigma, \beta_\xi \right) \stackrel{\cdot}{\sim} \mathcal{N}\left(\psi, H(\beta)^{-1} J(\beta) H(\beta)^{-1} \right), \qquad n \rightarrow +\infty \end{equation} where $H(\beta) = \mathbb{E}[\nabla^2 \ell_p(\beta;\mathbf{Y})]$ (the Hessian matrix) and $J(\beta) = \mbox{Var}[\nabla \ell_p(\beta;\mathbf{Y})]$. In practice, the spatial GEV model is fitted to data through the \verb|fitspatgev| function. The use of this function is similar to \verb|fitmaxstab|. Lets start by simulating a max-stable process with unit Fréchet margins and transform it to have a spatially structured GEV margins. <>= set.seed(15) @ <<>>= n.site <- 20 n.obs <- 50 coord <- matrix(runif(2*n.site, 0, 10), ncol = 2) colnames(coord) <- c("lon", "lat") data <- rmaxstab(n.obs, coord, "gauss", cov11 = 100, cov12 = 25, cov22 = 220) param.loc <- -10 + 2 * coord[,2] param.scale <- 5 + 2 * coord[,1] + coord[,2]^2 param.shape <- rep(0.2, n.site) for (i in 1:n.site) data[,i] <- frech2gev(data[,i], param.loc[i], param.scale[i], param.shape[i]) @ Now we define appropriate response surfaces for our spatial GEV model and fit two different models. <<>>= loc.form <- y ~ lat scale.form <- y ~ lon + I(lat^2) shape.form <- y ~ 1 shape.form2 <- y ~ lon M1 <- fitspatgev(data, coord, loc.form, scale.form, shape.form) M2 <- fitspatgev(data, coord, loc.form, scale.form, shape.form2) M1 @ The output of model \verb|M1| is very similar to the one of a fitted max-stable process except the spatial dependence parameters are not present. As explained in Chapter~\ref{cha:model-selection}, it is easy to perform model selection by inspecting the following output: <<>>= anova(M1, M2) TIC(M1, M2) @ From these two outputs, we can see that the $p$-value for the likelihood ratio test is around $0.72$ which advocates the use of model \verb|M1|. The TIC corroborates this conclusion. \chapter{Conclusion} \label{cha:conclusion} \appendix \chapter{P-splines with radial basis functions} \label{cha:p-splines-with} \section{Model definition} \label{sec:model-definition} Let us recall that a general definition of a p-spline is given by \begin{equation} \label{eq:p-splineGeneralDef} y_i = \beta_0 + \beta_1 x_i + \sum_{j=1}^q b_j(x_i - \kappa_j) \end{equation} for some basis functions $b_j$ and knots $\kappa_j$. As the purpose of this document is the modelling of spatial extremes, we benefit from using \emph{radial basis} functions. Radial basis functions depend only on the distance $|x_i - \kappa_j|$ so that a generalisation to higher dimension, i.e. $||\mathbf{x}_i - \mathbf{\kappa}_j||$, $\mathbf{x}_i, \mathbf{\kappa}_j \in \mathbb{R}^d$, is straightforward. The model for p-spline with radial basis function of order $p$, $p$ being odd, is \begin{equation} \label{eq:p-splineRadialBasis} f(x) = \beta_0 + \beta_1 x + \cdots + \beta_{m-1} x^{m-1} + \sum_{j=1}^q \beta_{m + j} |x - \kappa_j|^{2m - 1} \end{equation} where $p=2 m - 1$. The fitting criterion is \begin{equation} \label{eq:pLeastSquaresRadialBasis} \text{minimize } ||\mathbf{y} - \mathbf{X \beta}||^2 + \lambda^{2m - 1} \mathbf{\beta}^T \mathbf{K \beta} \end{equation} where \begin{equation*} \mathbf{X} = \begin{bmatrix} 1 & x_1 & \cdots & x_1^{m-1} & |x_1 - \kappa_1|^{2m - 1} & \cdots & |x_1 - \kappa_q|^{2m - 1}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ 1 & x_n & \cdots & x_n^{m-1} & |x_n - \kappa_1|^{2m - 1} & \cdots & |x_n - \kappa_q|^{2m - 1} \end{bmatrix} \end{equation*} and $\mathbf{K} = \mathbf{K}_*^T \mathbf{K}_*$ with \begin{equation*} \mathbf{K}_* = \begin{bmatrix} 0 & \cdots & 0 & 0 & \cdots & 0\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ 0 & \cdots & 0 & 0 & \cdots & 0\\ 0 & \ldots & 0 & |\kappa_1 - \kappa_1|^{m - 1/2} & \cdots & |\kappa_1 - \kappa_q|^{m - 1/2}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ 0 & \cdots & 0 & |\kappa_q - \kappa_1|^{m - 1/2} & \cdots & |\kappa_q - \kappa_q|^{m - 1/2} \end{bmatrix} \end{equation*} where the $m$ first rows and columns of $\mathbf{K}_*$ have zeros as elements. \section{Fast computation of p-splines} \label{sec:fast-computation-p} Although this section is included in the p-splines with radial basis functions, the methodology introduced here can be successfully applied to any other basis functions \citep{Ruppert2003}. As we stated in Section~\ref{sec:semipar-regr}, for a fixed smoothing parameter $\lambda$, the fitted values are given by \begin{equation*} \hat{\mathbf{y}} = \mathbf{X} ( \mathbf{X}^T \mathbf{X} + \lambda \mathbf{K})^{-1} \mathbf{X}^T \mathbf{y} \end{equation*} for some symmetric matrix $\mathbf{K}$. Consequently, to perform automatic selection for $\lambda$ by minimising the CV or GCV criterion might be computationally demanding and numerically unstable. Fortunately, the Demmler--Reinsch orthogonalisation often overcomes these issues. The following lines describe how it works in practice. \begin{enumerate} \item Obtain the Cholesky decomposition of $\mathbf{X}^T \mathbf{X}$ i.e. \begin{equation*} \mathbf{X}^T \mathbf{X} = \mathbf{R}^T \mathbf{R} \end{equation*} where $\mathbf{R}$ is a square matrix and invertible \item Obtain the singular value decomposition of $\mathbf{R}^{-T} \mathbf{KR^{-1}}$ i.e. \begin{equation*} \mathbf{R}^{-T} \mathbf{KR^{-1}} = \mathbf{U} \mathbf{\Lambda} \mathbf{U}^T \end{equation*} \item Define $\mathbf{A} \leftarrow \mathbf{X R^{-1} U}$ and $\mathbf{b} \leftarrow \mathbf{A}^T \mathbf{y}$ \item The fitted values are \begin{equation*} \hat{\mathbf{y}} = \mathbf{A} \frac{\mathbf{b}}{\mathbf{1} + \lambda \mathbf{\Lambda}} \end{equation*} with corresponding degrees of freedom \begin{equation*} df(\lambda) = \mathbf{1}^T \frac{\mathbf{1}}{\mathbf{1} + \lambda \mathbf{\Lambda}} \end{equation*} \end{enumerate} Once the matrices $\mathbf{A}$ and $\mathbf{\Lambda}$ and the vector $\mathbf{b}$ have been computed, the fitted values $\hat{\mathbf{y}}$ and $df(\lambda)$ are obtained through a simple matrix multiplication. This is appealing as now the automatic selection for $\lambda$ will be cheaper. \backmatter \bibliography{./references} \bibliographystyle{apalike} \printindex \end{document}