The purpose of this vignette is to discuss the parameterizations of the families (i.e., response distributions) used in brms. For a more general overview of the package see vignette("brms_overview").

Notation

Throughout this vignette, we denote values of the response variable as yy, a density function as ff, and use μ\mu to refer to the main model parameter, which is usually the mean of the response distribution or some closely related quantity. In a regression framework, μ\mu is not estimated directly but computed as μ=g(η)\mu = g(\eta), where η\eta is a predictor term (see help(brmsformula) for details) and gg is the response function (i.e., inverse of the link function).

Location shift models

The density of the gaussian family is given by f(y)=12πσexp(12(yμσ)2) f(y) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{y - \mu}{\sigma}\right)^2\right)

where σ\sigma is the residual standard deviation. The density of the student family is given by f(y)=Γ((ν+1)/2)Γ(ν/2)1νπσ(1+1ν(yμσ)2)(ν+1)/2 f(y) = \frac{\Gamma((\nu + 1)/2)}{\Gamma(\nu/2)} \frac{1}{\sqrt{\nu\pi}\sigma}\left(1 + \frac{1}{\nu} \left(\frac{y - \mu}{\sigma}\right)^2\right)^{-(\nu+1)/2}

Γ\Gamma denotes the gamma function and ν>1\nu > 1 are the degrees of freedom. As ν\nu \rightarrow \infty, the student distribution becomes the gaussian distribution. The density of the skew_normal family is given by f(y)=12πωexp(12(yξω)2)(1+erf(α(yξω2))) f(y) = \frac{1}{\sqrt{2\pi}\omega} \exp\left(-\frac{1}{2} \left(\frac{y - \xi}{\omega}\right)^2 \right) \left(1 + \text{erf} \left( \alpha \left(\frac{y - \xi}{\omega \sqrt{2}} \right) \right) \right)

where ξ\xi is the location parameter, ω\omega is the positive scale parameter, α\alpha the skewness parameter, and erf\text{erf} denotes the error function of the gaussian distribution. To parameterize the skew-normal distribution in terms of the mean μ\mu and standard deviation σ\sigma, ω\omega and ξ\xi are computed as ω=σ12πα21+α2 \omega = \frac{\sigma}{\sqrt{1 - \frac{2}{\pi} \frac{\alpha^2}{1 + \alpha^2}}}

ξ=μωα1+α22π \xi = \mu - \omega \frac{\alpha}{\sqrt{1 + \alpha^2}} \sqrt{\frac{2}{\pi}}

If α=0\alpha = 0, the skew-normal distribution becomes the gaussian distribution. For location shift models, yy can be any real value.

Binary and count data models

The density of the binomial family is given by f(y)=(Ny)μy(1μ)Ny f(y) = {N \choose y} \mu^{y} (1-\mu)^{N - y} where NN is the number of trials and y{0,...,N}y \in \{0, ... , N\}. When all NN are 11 (i.e., y{0,1}y \in \{0,1\}), the bernoulli distribution for binary data arises.

For y0y \in \mathbb{N}_0, the density of the poisson family is given by f(y)=μyy!exp(μ) f(y) = \frac{\mu^{y}}{y!} \exp(-\mu) The density of the negbinomial (negative binomial) family is f(y)=(y+ϕ1y)(μμ+ϕ)y(ϕμ+ϕ)ϕ f(y) = {y + \phi - 1 \choose y} \left(\frac{\mu}{\mu + \phi}\right)^{y} \left(\frac{\phi}{\mu + \phi}\right)^\phi where ϕ\phi is a positive precision parameter. For ϕ\phi \rightarrow \infty, the negative binomial distribution becomes the poisson distribution. The density of the geometric family arises if ϕ\phi is set to 11.

Time-to-event models

With time-to-event models we mean all models that are defined on the positive reals only, that is y+y \in \mathbb{R}^+. The density of the lognormal family is given by f(y)=12πσyexp(12(log(y)μσ)2) f(y) = \frac{1}{\sqrt{2\pi}\sigma y} \exp\left(-\frac{1}{2}\left(\frac{\log(y) - \mu}{\sigma}\right)^2\right) where σ\sigma is the residual standard deviation on the log-scale. The density of the Gamma family is given by f(y)=(α/μ)αΓ(α)yα1exp(αyμ) f(y) = \frac{(\alpha / \mu)^\alpha}{\Gamma(\alpha)} y^{\alpha-1} \exp\left(-\frac{\alpha y}{\mu}\right) where α\alpha is a positive shape parameter. The density of the weibull family is given by f(y)=αs(ys)α1exp((ys)α) f(y) = \frac{\alpha}{s} \left(\frac{y}{s}\right)^{\alpha-1} \exp\left(-\left(\frac{y}{s}\right)^\alpha\right) where α\alpha is again a positive shape parameter and s=μ/Γ(1+1/α)s = \mu / \Gamma(1 + 1 / \alpha) is the scale parameter to that μ\mu is the mean of the distribution. The exponential family arises if α\alpha is set to 11 for either the gamma or Weibull distribution. The density of the inverse.gaussian family is given by f(y)=(α2πy3)1/2exp(α(yμ)22μ2y) f(y) = \left(\frac{\alpha}{2 \pi y^3}\right)^{1/2} \exp \left(\frac{-\alpha (y - \mu)^2}{2 \mu^2 y} \right) where α\alpha is a positive shape parameter. The cox family implements Cox proportional hazards model which assumes a hazard function of the form h(y)=h0(y)μh(y) = h_0(y) \mu with baseline hazard h0(y)h_0(y) expressed via M-splines (which integrate to I-splines) in order to ensure monotonicity. The density of the cox model is then given by f(y)=h(y)S(y) f(y) = h(y) S(y) where S(y)S(y) is the survival function implied by h(y)h(y).

Extreme value models

Modeling extremes requires special distributions. One may use the weibull distribution (see above) or the frechet distribution with density f(y)=νs(ys)1νexp((ys)ν) f(y) = \frac{\nu}{s} \left(\frac{y}{s}\right)^{-1-\nu} \exp\left(-\left(\frac{y}{s}\right)^{-\nu}\right) where s=μ/Γ(11/ν)s = \mu / \Gamma(1 - 1 / \nu) is a positive scale parameter and ν>1\nu > 1 is a shape parameter so that μ\mu predicts the mean of the Frechet distribution. A generalization of both distributions is the generalized extreme value distribution (family gen_extreme_value) with density f(y)=1σt(y)ξ+1exp(t(y)) f(y) = \frac{1}{\sigma} t(y)^{\xi + 1} \exp(-t(y)) where t(y)=(1+ξ(yμσ))1/ξ t(y) = \left(1 + \xi \left(\frac{y - \mu}{\sigma} \right)\right)^{-1 / \xi} with positive scale parameter σ\sigma and shape parameter ξ\xi.

Response time models

One family that is especially suited to model reaction times is the exgaussian (‘exponentially modified Gaussian’) family. Its density is given by

f(y)=12βexp(12β(2ξ+σ2/β2y))erfc(ξ+σ2/βy2σ) f(y) = \frac{1}{2 \beta} \exp\left(\frac{1}{2 \beta} \left(2\xi + \sigma^2 / \beta - 2 y \right) \right) \text{erfc}\left(\frac{\xi + \sigma^2 / \beta - y}{\sqrt{2} \sigma} \right) where β\beta is the scale (inverse rate) of the exponential component, ξ\xi is the mean of the Gaussian component, σ\sigma is the standard deviation of the Gaussian component, and erfc\text{erfc} is the complementary error function. We parameterize μ=ξ+β\mu = \xi + \beta so that the main predictor term equals the mean of the distribution.

Another family well suited for modeling response times is the shifted_lognormal distribution. It’s density equals that of the lognormal distribution except that the whole distribution is shifted to the right by a positive parameter called ndt (for consistency with the wiener diffusion model explained below).

A family concerned with the combined modeling of reaction times and corresponding binary responses is the wiener diffusion model. It has four model parameters each with a natural interpretation. The parameter α>0\alpha > 0 describes the separation between two boundaries of the diffusion process, τ>0\tau > 0 describes the non-decision time (e.g., due to image or motor processing), β[0,1]\beta \in [0, 1] describes the initial bias in favor of the upper alternative, and δ\delta \in \mathbb{R} describes the drift rate to the boundaries (a positive value indicates a drift towards to upper boundary). The density for the reaction time at the upper boundary is given by

f(y)=α(yτ)3/2exp(δαβδ2(yτ)2)k=(2k+β)ϕ(2k+αβyτ) f(y) = \frac{\alpha}{(y-\tau)^3/2} \exp \! \left(- \delta \alpha \beta - \frac{\delta^2(y-\tau)}{2}\right) \sum_{k = - \infty}^{\infty} (2k + \beta) \phi \! \left(\frac{2k + \alpha \beta}{\sqrt{y - \tau}}\right)

where ϕ(x)\phi(x) denotes the standard normal density function. The density at the lower boundary can be obtained by substituting 1β1 - \beta for β\beta and δ-\delta for δ\delta in the above equation. In brms the parameters α\alpha, τ\tau, and β\beta are modeled as auxiliary parameters named bs (‘boundary separation’), ndt (‘non-decision time’), and bias respectively, whereas the drift rate δ\delta is modeled via the ordinary model formula that is as δ=μ\delta = \mu.

Quantile regression

Quantile regression is implemented via family asym_laplace (asymmetric Laplace distribution) with density

f(y)=p(1p)σexp(ρp(yμσ)) f(y) = \frac{p (1 - p)}{\sigma} \exp\left(-\rho_p\left(\frac{y - \mu}{\sigma}\right)\right) where ρp\rho_p is given by ρp(x)=x(pIx<0)\rho_p(x) = x (p - I_{x < 0}) and IAI_A is the indicator function of set AA. The parameter σ\sigma is a positive scale parameter and pp is the quantile parameter taking on values in (0,1)(0, 1). For this distribution, we have P(Y<g(η))=pP(Y < g(\eta)) = p. Thus, quantile regression can be performed by fixing pp to the quantile to interest.

Probability models

The density of the Beta family for y(0,1)y \in (0,1) is given by f(y)=yμϕ1(1y)(1μ)ϕ1B(μϕ,(1μ)ϕ) f(y) = \frac{y^{\mu \phi - 1} (1-y)^{(1-\mu) \phi-1}}{B(\mu \phi, (1-\mu) \phi)} where BB is the beta function and ϕ\phi is a positive precision parameter. A multivariate generalization of the Beta family is the dirichlet family with density f(y)=1B((μ1,,μK)ϕ)k=1Kykμkϕ1. f(y) = \frac{1}{B((\mu_{1}, \ldots, \mu_{K}) \phi)} \prod_{k=1}^K y_{k}^{\mu_{k} \phi - 1}. The dirichlet family is implemented with the multivariate logit link function so that μj=exp(ηj)k=1Kexp(ηk) \mu_{j} = \frac{\exp(\eta_{j})}{\sum_{k = 1}^{K} \exp(\eta_{k})} For reasons of identifiability, $\eta_{\rm ref}$ is set to 00, where ${\rm ref}$ is one of the response categories chosen as reference.

An alternative to the dirichlet family is the logistic_normal family with density f(y)=1k=1Kyk×multivariate_normal(ỹ|μ,σ,Ω) f(y) = \frac{1}{\prod_{k=1}^K y_k} \times \text{multivariate_normal}(\tilde{y} \, | \, \mu, \sigma, \Omega) where ỹ\tilde{y} is the multivariate logit transformed response $$ \tilde{y} = (\log(y_1 / y_{\rm ref}), \ldots, \log(y_{\rm ref-1} / y_{\rm ref}), \log(y_{\rm ref+1} / y_{\rm ref}), \ldots, \log(y_K / y_{\rm ref})) $$ of dimension K1K-1 (excluding the reference category), which is modeled as multivariate normally distributed with latent mean and standard deviation vectors μ\mu and σ\sigma, as well as correlation matrix Ω\Omega.

Circular models

The density of the von_mises family for y(π,π)y \in (-\pi,\pi) is given by f(y)=exp(κcos(yμ))2πI0(κ) f(y) = \frac{\exp(\kappa \cos(y - \mu))}{2\pi I_0(\kappa)} where I0I_0 is the modified Bessel function of order 0 and κ\kappa is a positive precision parameter.

Ordinal and categorical models

For ordinal and categorical models, yy is one of the categories 1,...,K1, ..., K. The intercepts of ordinal models are called thresholds and are denoted as τk\tau_k, with k{1,...,K1}k \in \{1, ..., K-1\}, whereas η\eta does not contain a fixed effects intercept. Note that the applied link functions hh are technically distribution functions [0,1]\mathbb{R} \rightarrow [0,1]. The density of the cumulative family (implementing the most basic ordinal model) is given by f(y)=g(τy+1η)g(τyη) f(y) = g(\tau_{y + 1} - \eta) - g(\tau_{y} - \eta)

The densities of the sratio (stopping ratio) and cratio (continuation ratio) families are given by f(y)=g(τy+1η)k=1y(1g(τkη)) f(y) = g(\tau_{y + 1} - \eta) \prod_{k = 1}^{y} (1 - g(\tau_{k} - \eta)) and f(y)=(1g(ητy+1))k=1yg(ητk) f(y) = (1 - g(\eta - \tau_{y + 1})) \prod_{k = 1}^{y} g(\eta - \tau_{k})

respectively. Note that both families are equivalent for symmetric link functions such as logit or probit. The density of the acat (adjacent category) family is given by f(y)=k=1yg(ητk)k=y+1K(1g(ητk))k=0Kj=1kg(ητj)j=k+1K(1g(ητj)) f(y) = \frac{\prod_{k=1}^{y} g(\eta - \tau_{k}) \prod_{k=y+1}^K(1-g(\eta - \tau_{k}))}{\sum_{k=0}^K\prod_{j=1}^k g(\eta-\tau_{j}) \prod_{j=k+1}^K(1-g(\eta - \tau_{j}))} For the logit link, this can be simplified to f(y)=exp(k=1y(ητk))k=0Kexp(j=1k(ητj)) f(y) = \frac{\exp \left(\sum_{k=1}^{y} (\eta - \tau_{k}) \right)} {\sum_{k=0}^K \exp\left(\sum_{j=1}^k (\eta - \tau_{j}) \right)} The linear predictor η\eta can be generalized to also depend on the category kk for a subset of predictors. This leads to category specific effects (for details on how to specify them see help(brm)). Note that cumulative and sratio models use τη\tau - \eta, whereas cratio and acat use ητ\eta - \tau. This is done to ensure that larger values of η\eta increase the probability of higher response categories.

The categorical family is currently only implemented with the multivariate logit link function and has density f(y)=μy=exp(ηy)k=1Kexp(ηk) f(y) = \mu_{y} = \frac{\exp(\eta_{y})}{\sum_{k = 1}^{K} \exp(\eta_{k})} Note that η\eta does also depend on the category kk. For reasons of identifiability, η1\eta_{1} is set to 00. A generalization of the categorical family to more than one trial is the multinomial family with density f(y)=(Ny1,y2,,yK)k=1Kμkyk f(y) = {N \choose y_{1}, y_{2}, \ldots, y_{K}} \prod_{k=1}^K \mu_{k}^{y_{k}} where, for each category, μk\mu_{k} is estimated via the multivariate logit link function shown above.

Zero-inflated and hurdle models

Zero-inflated and hurdle families extend existing families by adding special processes for responses that are zero. The density of a zero-inflated family is given by $$ f_z(y) = z + (1 - z) f(0) \quad \text{if } y = 0 \\ f_z(y) = (1 - z) f(y) \quad \text{if } y > 0 $$ where zz denotes the zero-inflation probability. Currently implemented families are zero_inflated_poisson, zero_inflated_binomial, zero_inflated_negbinomial, and zero_inflated_beta.

The density of a hurdle family is given by $$ f_z(y) = z \quad \text{if } y = 0 \\ f_z(y) = (1 - z) f(y) / (1 - f(0)) \quad \text{if } y > 0 $$ Currently implemented families are hurdle_poisson, hurdle_negbinomial, hurdle_gamma, and hurdle_lognormal.

The density of a zero-one-inflated family is given by $$ f_{\alpha, \gamma}(y) = \alpha (1 - \gamma) \quad \text{if } y = 0 \\ f_{\alpha, \gamma}(y) = \alpha \gamma \quad \text{if } y = 1 \\ f_{\alpha, \gamma}(y) = (1 - \alpha) f(y) \quad \text{if } y \notin \{0, 1\} $$ where α\alpha is the zero-one-inflation probability (i.e. the probability that zero or one occurs) and γ\gamma is the conditional one-inflation probability (i.e. the probability that one occurs rather than zero). Currently implemented families are zero_one_inflated_beta.