Deforesting a random forest — deforest • ranger

The main purpose of this function is to allow for post-processing of ensembles via L2 regularized regression (i.e., the LASSO), as described in Friedman and Popescu (2003). The basic idea is to use the LASSO to post-process the predictions from the individual base learners in an ensemble (i.e., decision trees) in the hopes of producing a much smaller model without sacrificing much in the way of accuracy, and in some cases, improving it. Friedman and Popescu (2003) describe conditions under which tree-based ensembles, like random forest, can potentially benefit from such post-processing (e.g., using shallower trees trained on much smaller samples of the training data without replacement). However, the computational benefits of such post-processing can only be realized if the base learners "zeroed out" by the LASSO can actually be removed from the original ensemble, hence the purpose of this function. A complete example using ranger can be found at https://github.com/imbs-hl/ranger/issues/568.

deforest(object, which.trees = NULL, ...)

# S3 method for class 'ranger'
deforest(object, which.trees = NULL, warn = TRUE, ...)

Arguments

object: A fitted random forest (e.g., a ranger object).
which.trees: Vector giving the indices of the trees to remove.
...: Additional (optional) arguments. (Currently ignored.)
warn: Logical indicating whether or not to warn users that some of the standard output of a typical ranger object or no longer available after deforestation. Default is TRUE.

Value

An object of class "deforest.ranger"; essentially, a ranger object with certain components replaced with NAs (e.g., out-of-bag (OOB) predictions, variable importance scores (if requested), and OOB-based error metrics).

Note

This function is a generic and can be extended by other packages.

References

Friedman, J. and Popescu, B. (2003). Importance sampled learning ensembles, Technical report, Stanford University, Department of Statistics. https://jerryfriedman.su.domains/ftp/isle.pdf.

Author

Brandon M. Greenwell

Examples

## Example of deforesting a random forest
rfo <- ranger(Species ~ ., data = iris, probability = TRUE, num.trees = 100)
dfo <- deforest(rfo, which.trees = c(1, 3, 5))
#> Warning: Many of the components of a typical "ranger" object are not available after deforestation and are instead replaced with `NA` (e.g., out-of-bag (OOB) predictions, variable importance scores (if requested), and OOB-based error metrics).
dfo  # same as `rfo` but with trees 1, 3, and 5 removed
#> Ranger (deforested) result
#> 
#> Note that many of the components of a typical "ranger" object are not available after deforestation and are instead replaced with `NA` (e.g., out-of-bag (OOB) predictions, variable importance scores (if requested), and OOB-based error metrics) 
#> 
#> Type:                             Probability estimation 
#> Number of trees:                  97 
#> Sample size:                      150 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 10 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error (Brier s.):  NA 

## Sanity check
preds.rfo <- predict(rfo, data = iris, predict.all = TRUE)$predictions
preds.dfo <- predict(dfo, data = iris, predict.all = TRUE)$predictions
identical(preds.rfo[, , -c(1, 3, 5)], y = preds.dfo)
#> [1] TRUE