Compute exact cross-validation for problematic observations

Compute exact cross-validation for problematic observations for which approximate leave-one-out cross-validation may return incorrect results. Models for problematic observations can be run in parallel using the future package.

# S3 method for class 'brmsfit'
reloo(
  x,
  loo = NULL,
  k_threshold = 0.7,
  newdata = NULL,
  resp = NULL,
  check = TRUE,
  recompile = NULL,
  future_args = list(),
  ...
)

# S3 method for class 'loo'
reloo(x, fit, ...)

reloo(x, ...)

Arguments

x: An R object of class brmsfit or loo depending on the method.
loo: An R object of class loo. If NULL, brms will try to extract a precomputed loo object from the fitted model, added there via add_criterion.
k_threshold: The threshold at which Pareto \(k\) estimates are treated as problematic. Defaults to 0.7. See pareto_k_ids for more details.
newdata: An optional data.frame for which to evaluate predictions. If NULL (default), the original data of the model is used. NA values within factors (excluding grouping variables) are interpreted as if all dummy variables of this factor are zero. This allows, for instance, to make predictions of the grand mean when using sum coding. NA values within grouping variables are treated as a new level.
resp: Optional names of response variables. If specified, predictions are performed only for the specified response variables.
check: Logical; If TRUE (the default), some checks check are performed if the loo object was generated from the brmsfit object passed to argument fit.
recompile: Logical, indicating whether the Stan model should be recompiled. This may be necessary if you are running reloo on another machine than the one used to fit the model.
future_args: A list of further arguments passed to future for additional control over parallel execution if activated.
...: Further arguments passed to update.brmsfit and log_lik.brmsfit.
fit: An R object of class brmsfit.

Value

An object of the class loo.

Details

Warnings about Pareto \(k\) estimates indicate observations for which the approximation to LOO is problematic (this is described in detail in Vehtari, Gelman, and Gabry (2017) and the loo package documentation). If there are \(J\) observations with \(k\) estimates above k_threshold, then reloo will refit the original model \(J\) times, each time leaving out one of the \(J\) problematic observations. The pointwise contributions of these observations to the total ELPD are then computed directly and substituted for the previous estimates from these \(J\) observations that are stored in the original loo object.

By default, this method uses sample_new_levels = "gaussian" to sample parameter values for new grouping-factor levels (see also prepare_predictions). This default will fail for models with non-Gaussian group-level effects. In this case, we recommend setting sample_new_levels = "uncertainty".

Parallelization with multiple CPU cores

brms can make use of multiple CPU cores in parallel to speed up computations in various ways. For efficient use of the available resources it is recommended to only use parallelism to an extend such that the available physical CPUs are not oversubscribed. For example, when you have 8 CPU cores locally available, then you may consider to run 4 chains with 2 threads per chain for best performance if you happen to just run a single model. In case you run a simulation study which requires to run many times a given model, then neither chain nor within-chain parallelization is advisable as the computational resources are already exhausted by the simulation study and any further parallelization beyond the simulation study itself will in fact slow down the overall runtime. Please be aware that for historical reasons the nomenclature of the arguments is possibly confusing. The cores argument refers to running different chains in parallel and the within-chain parallelization will allocate for each chain as many threads as requested. The requested threads therefore increase the use of overall CPUs in a multiplicative way.

For more advanced parallelization (including beyond single model fits), brms also integrates with the future package. Importantly, this enables seamless integration with the mirai parallelization framework through the use of the future.mirai adapter. With mirai local and remote machines can be used in a fully transparent manner to the user. This includes the possibility to use large number of remote machines running in the context of a computer cluster, which are managed with queuing systems. Please refer to the section on distributed computing of mirai::daemons.

Examples