Stacking model weights

Compute model weights based on a cross-validation-like procedure.

stackingWeights(object, ..., data, R, p = 0.5)

Arguments

object, ...: two or more fitted glm objects, or a list of such, or an "averaging"=model.avg object.
data: a data frame containing the variables in the model, used for fitting and prediction.
R: the number of replicates.
p: the proportion of the data to be used as training set. Defaults to 0.5.

Value

A matrix with two rows, containing model weights calculated using mean and median.

Details

Each model in a set is fitted to the training data: a subset of p * N observations in data. From these models a prediction is produced on the remaining part of data (the test or hold-out data). These hold-out predictions are fitted to the hold-out observations, by optimising the weights by which the models are combined. This process is repeated R times, yielding a distribution of weights for each model (which Smyth & Wolpert (1998) referred to as an ‘empirical Bayesian estimate of posterior model probability’). A mean or median of model weights for each model is taken and re-scaled to sum to one.

Note

This approach requires a sample size of at least \(2\times\) the number of models.

References

Wolpert, D. H. 1992 Stacked generalization. Neural Networks 5, 241–259.

Smyth, P. and Wolpert, D. 1998 An Evaluation of Linearly Combining Density Estimators via Stacking. Technical Report No. 98–25. Information and Computer Science Department, University of California, Irvine, CA.

Dormann, C. et al. 2018 Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs 88, 485–504.

Author

Carsten Dormann, Kamil Bartoń

Examples

#simulated Cement dataset to increase sample size for the training data 
fm0 <- glm(y ~ X1 + X2 + X3 + X4, data = Cement, na.action = na.fail)
dat <- as.data.frame(apply(Cement[, -1], 2, sample, 50, replace = TRUE))
dat$y <- rnorm(nrow(dat), predict(fm0), sigma(fm0))

# global model fitted to training data:
fm <- glm(y ~ X1 + X2 + X3 + X4, data = dat, na.action = na.fail)

# generate a list of *some* subsets of the global model
models <- lapply(dredge(fm, evaluate = FALSE, fixed = "X1", m.lim = c(1, 3)), eval)
#> Fixed terms are "X1" and "(Intercept)"
#> Error in eval(mf, parent.frame()): object 'dat' not found

wts <- stackingWeights(models, data = dat, R = 10)
#> Error: object 'models' not found

ma <- model.avg(models)
#> Error: object 'models' not found
Weights(ma) <- wts["mean", ]
#> Error: object 'wts' not found

predict(ma)
#> Error: object 'ma' not found