Set up a Gaussian process (GP) term in brms. The function does not evaluate its arguments – it exists purely to help set up a model with GP terms.

gp(
  ...,
  by = NA,
  k = NA,
  cov = "exp_quad",
  iso = TRUE,
  gr = TRUE,
  cmc = TRUE,
  scale = TRUE,
  c = 5/4
)

Arguments

...

One or more predictors for the GP.

by

A numeric or factor variable of the same length as each predictor. In the numeric vector case, the elements multiply the values returned by the GP. In the factor variable case, a separate GP is fitted for each factor level.

k

Optional number of basis functions for computing Hilbert-space approximate GPs. If NA (the default), exact GPs are computed.

cov

Name of the covariance kernel. Currently supported are "exp_quad" (exponentiated-quadratic kernel; default), "matern32" (Matern 3/2 kernel), "matern52" (Matern 5/2 kernel), and "exponential" (exponential kernel; alias: "matern12").

iso

A flag to indicate whether an isotropic (TRUE; the default) or a non-isotropic GP should be used. In the former case, the same amount of smoothing is applied to all predictors. In the latter case, predictors may have different smoothing. Ignored if only a single predictor is supplied.

gr

Logical; Indicates if auto-grouping should be used (defaults to TRUE). If enabled, observations sharing the same predictor values will be represented by the same latent variable in the GP. This will improve sampling efficiency drastically if the number of unique predictor combinations is small relative to the number of observations.

cmc

Logical; Only relevant if by is a factor. If TRUE (the default), cell-mean coding is used for the by-factor, that is one GP per level is estimated. If FALSE, contrast GPs are estimated according to the contrasts set for the by-factor.

scale

Logical; If TRUE (the default), predictors are scaled so that the maximum Euclidean distance between two points is 1. This often improves sampling speed and convergence. Scaling also affects the estimated length-scale parameters in that they resemble those of scaled predictors (not of the original predictors) if scale is TRUE.

c

Numeric value only used in approximate GPs. Defines the multiplicative constant of the predictors' range over which predictions should be computed. A good default could be c = 5/4 but we are still working on providing better recommendations.

Value

An object of class 'gp_term', which is a list of arguments to be interpreted by the formula parsing functions of brms.

Details

A GP is a stochastic process, which describes the relation between one or more predictors \(x = (x_1, ..., x_d)\) and a response \(f(x)\), where \(d\) is the number of predictors. A GP is the generalization of the multivariate normal distribution to an infinite number of dimensions. Thus, it can be interpreted as a prior over functions. The values of \(f( )\) at any finite set of locations are jointly multivariate normal, with a covariance matrix defined by the covariance kernel \(k_p(x_i, x_j)\), where \(p\) is the vector of parameters of the GP: $$(f(x_1), \ldots f(x_n) \sim MVN(0, (k_p(x_i, x_j))_{i,j=1}^n) .$$ The smoothness and general behavior of the function \(f\) depends only on the choice of covariance kernel. For a more detailed introduction to Gaussian processes, see https://en.wikipedia.org/wiki/Gaussian_process.

For mathematical details on the supported kernels, please see the Stan manual: https://mc-stan.org/docs/functions-reference/matrix_operations.html under "Gaussian Process Covariance Functions".

There are several parameters estimated for GPs, the most important of which are as follows:

ParameterNotationSupportMeaning
lscale\(\ell\)\(\mathbb{R}^+\)length-scale of the GP's covariance kernel
sdgp\(\sigma\)\(\mathbb{R}^+\)marginal standard deviation of the GP's covariance kernel
zgp\(z\)\(\mathbb{R}\)latent variable values of the training data

Assuming an exponentiated quadratic covariance structure, the parameters can be broadly interpreted as follows (see also https://mc-stan.org/docs/stan-users-guide/gaussian-processes.html#gaussian-process-regression). Note that in the above documentation the parameter \(\sigma\) is denoted as \(\alpha\) instead, and the parameter \(\ell\) as \(\rho\). The length-scale \(\ell\) controls the frequency of the functions represented by the GP prior i.e., values of \(\ell \gg 0\) lead to lower-frequency functions, while values of \(\ell \approx 0\) lead to higher-frequency functions. In slightly simpler terms, \(\ell\) sets the distance over which observations in the input space are strongly correlated. The marginal standard deviation \(\sigma\) controls the magnitude of the range of the function represented by the GP i.e., it represents how much the values of the function tend to deviate from the mean level. Lastly, the parameter \(z\) represents latent variable values per observation (or unique input point).

See also

Examples

if (FALSE) { # \dontrun{
# simulate data using the mgcv package
dat <- mgcv::gamSim(1, n = 30, scale = 2)

# fit a simple GP model
fit1 <- brm(y ~ gp(x2), dat, chains = 2)
summary(fit1)
me1 <- conditional_effects(fit1, ndraws = 200, spaghetti = TRUE)
plot(me1, ask = FALSE, points = TRUE)

# fit a more complicated GP model and use an approximate GP for x2
fit2 <- brm(y ~ gp(x0) + x1 + gp(x2, k = 10) + x3, dat, chains = 2)
summary(fit2)
me2 <- conditional_effects(fit2, ndraws = 200, spaghetti = TRUE)
plot(me2, ask = FALSE, points = TRUE)

# fit a multivariate GP model with Matern 3/2 kernel
fit3 <- brm(y ~ gp(x1, x2, cov = "matern32"), dat, chains = 2)
summary(fit3)
me3 <- conditional_effects(fit3, ndraws = 200, spaghetti = TRUE)
plot(me3, ask = FALSE, points = TRUE)

# compare model fit
loo(fit1, fit2, fit3)

# simulate data with a factor covariate
dat2 <- mgcv::gamSim(4, n = 90, scale = 2)

# fit separate gaussian processes for different levels of 'fac'
fit4 <- brm(y ~ gp(x2, by = fac), dat2, chains = 2)
summary(fit4)
plot(conditional_effects(fit4), points = TRUE)
} # }