Compute exact cross-validation for problematic observations for which approximate leave-one-out cross-validation may return incorrect results. Models for problematic observations can be run in parallel using the future package.
# S3 method for class 'brmsfit'
reloo(
x,
loo = NULL,
k_threshold = 0.7,
newdata = NULL,
resp = NULL,
check = TRUE,
recompile = NULL,
future_args = list(),
...
)
# S3 method for class 'loo'
reloo(x, fit, ...)
reloo(x, ...)An R object of class brmsfit or loo depending
on the method.
An R object of class loo. If NULL,
brms will try to extract a precomputed loo object
from the fitted model, added there via add_criterion.
The threshold at which Pareto \(k\)
estimates are treated as problematic. Defaults to 0.7.
See pareto_k_ids
for more details.
An optional data.frame for which to evaluate predictions. If
NULL (default), the original data of the model is used. NA
values within factors (excluding grouping variables) are interpreted as if
all dummy variables of this factor are zero. This allows, for instance, to
make predictions of the grand mean when using sum coding. NA values
within grouping variables are treated as a new level.
Optional names of response variables. If specified, predictions are performed only for the specified response variables.
Logical; If TRUE (the default), some checks
check are performed if the loo object was generated
from the brmsfit object passed to argument fit.
Logical, indicating whether the Stan model should be
recompiled. This may be necessary if you are running reloo on
another machine than the one used to fit the model.
A list of further arguments passed to
future for additional control over parallel
execution if activated.
Further arguments passed to
update.brmsfit and log_lik.brmsfit.
An R object of class brmsfit.
An object of the class loo.
Warnings about Pareto \(k\) estimates indicate observations
for which the approximation to LOO is problematic (this is described in
detail in Vehtari, Gelman, and Gabry (2017) and the
loo package documentation).
If there are \(J\) observations with \(k\) estimates above
k_threshold, then reloo will refit the original model
\(J\) times, each time leaving out one of the \(J\)
problematic observations. The pointwise contributions of these observations
to the total ELPD are then computed directly and substituted for the
previous estimates from these \(J\) observations that are stored in the
original loo object.
By default, this method uses sample_new_levels = "gaussian"
to sample parameter values for new grouping-factor levels (see also
prepare_predictions). This default will fail for models with
non-Gaussian group-level effects. In this case, we recommend setting
sample_new_levels = "uncertainty".
brms can make use of multiple CPU cores in parallel to speed
up computations in various ways. For efficient use of the
available resources it is recommended to only use parallelism to
an extend such that the available physical CPUs are not
oversubscribed. For example, when you have 8 CPU cores locally
available, then you may consider to run 4 chains with 2 threads
per chain for best performance if you happen to just run a single
model. In case you run a simulation study which requires to run
many times a given model, then neither chain nor within-chain
parallelization is advisable as the computational resources are
already exhausted by the simulation study and any further
parallelization beyond the simulation study itself will in fact
slow down the overall runtime. Please be aware that for
historical reasons the nomenclature of the arguments is possibly
confusing. The cores argument refers to running different
chains in parallel and the within-chain parallelization will
allocate for each chain as many threads as requested. The
requested threads therefore increase the use of overall CPUs in a
multiplicative way.
For more advanced parallelization (including beyond single model
fits), brms also integrates with the future
package. Importantly, this enables seamless integration with the
mirai parallelization framework through the use of the
future.mirai adapter. With mirai local and remote
machines can be used in a fully transparent manner to the
user. This includes the possibility to use large number of remote
machines running in the context of a computer cluster, which are
managed with queuing systems. Please refer to the section on
distributed computing of
mirai::daemons.