Bootstrap confidence intervals for performance metrics

Using out-of-sample predictions, the bootstrap is used to create percentile confidence intervals.

# S3 method for class 'tune_results'
int_pctl(
  .data,
  metrics = NULL,
  eval_time = NULL,
  times = 1001,
  parameters = NULL,
  alpha = 0.05,
  allow_par = TRUE,
  event_level = "first",
  keep_replicates = FALSE,
  ...
)

Arguments

.data: A object with class tune_results where the save_pred = TRUE option was used in the control function.
metrics: A yardstick::metric_set(). By default, it uses the same metrics as the original object.
eval_time: A vector of evaluation times for censored regression models. NULL is appropriate otherwise. If NULL is used with censored models, a evaluation time is selected, and a warning is issued.
times: The number of bootstrap samples.
parameters: An optional tibble of tuning parameter values that can be used to filter the predicted values before processing. This tibble should only have columns for each tuning parameter identifier (e.g. "my_param" if tune("my_param") was used).
alpha: Level of significance.
allow_par: A logical to allow parallel processing (if a parallel backend is registered).
event_level: A single string. Either "first" or "second" to specify which level of truth to consider as the "event".
keep_replicates: A logic for saving the individual estimates from each bootstrap sample (as a list column called .values).
...: Not currently used.

Value

A tibble of metrics with additional columns for .lower and .upper (and potentially, .values).

Details

For each model configuration (if any), this function takes bootstrap samples of the out-of-sample predicted values. For each bootstrap sample, the metrics are computed and these are used to compute confidence intervals. See rsample::int_pctl() and the references therein for more details.

Note that the .estimate column is likely to be different from the results given by collect_metrics() since a different estimator is used. Since random numbers are used in sampling, set the random number seed prior to running this function.

The number of bootstrap samples should be large to have reliable intervals. The defaults reflect the fewest samples that should be used.

The computations for each configuration can be extensive. To increase computational efficiency parallel processing can be used. The future package is used here. To execute the resampling iterations in parallel, specify a plan with future first. The allow_par argument can be used to avoid parallelism.

Also, if a censored regression model used numerous evaluation times, the computations can take a long time unless the times are filtered with the eval_time argument.

References

Davison, A., & Hinkley, D. (1997). Bootstrap Methods and their Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843

Examples