model.avg.Rd
Model averaging based on an information criterion.
model.avg(object, ..., revised.var = TRUE)
# Default S3 method
model.avg(object, ..., beta = c("none", "sd", "partial.sd"),
rank = NULL, rank.args = NULL, revised.var = TRUE,
dispersion = NULL, ct.args = NULL)
# S3 method for class 'model.selection'
model.avg(object, subset, fit = FALSE, ..., revised.var = TRUE)
a fitted model object or a list of such objects, or a
"model.selection"
object. See ‘Details’.
for default method, more fitted model objects. Otherwise, arguments that are passed to the default method.
indicates whether and how the component models' coefficients should be standardized. See the argument's description in dredge.
optionally, a rank function (returning an information criterion) to
use instead of AICc
, e.g. BIC
or QAIC
, may be
omitted if object
is a model list returned by get.models
or a "model.selection"
object. See ‘Details’.
optional list
of arguments for the rank
function. If one is an expression, an x
within it is substituted
with a current model.
logical, indicating whether to use the revised formula for standard errors. See par.avg.
the dispersion parameter for the family used. See
summary.glm. This is used currently only with glm
,
is silently ignored otherwise.
optional list of arguments to be passed to
coefTable (besides dispersion
).
see subset=subset.model.selection method for
"model.selection"
object.
if TRUE
, the component models are fitted using
get.models
. See ‘Details’.
An object of class "averaging"
is a list with components:
a data.frame
with log-likelihood, IC,
Δ_IC and
‘Akaike weights’ for the component models.
Its attribute "term.codes"
is a named vector with numerical
representation of the terms in the row names of msTable
.
a matrix
of model-averaged coefficients.
“full” coefficients in the first row, “subset” coefficients in
the second row. See ‘Note’
a 3-dimensional array
of component models' coefficients,
their standard errors and degrees of freedom.
object of class sw
containing per-model term
sum of model weights over all of the models in which the term appears.
a formula corresponding to the one that would be used in a single model. The formula contains only the averaged (fixed) coefficients.
the matched call.
The object has the following attributes:
the rank function used.
optionally, a list of all component model objects. Only if the object was created with model objects (and not model selection table).
Corresponds to the function argument.
number of observations.
Corresponds to the function argument.
model.avg
may be used either with a list of models or directly with a
model.selection
object (e.g. returned by dredge
). In the
latter case, the models from the model selection table are not evaluated
unless the argument fit
is set to TRUE
or some additional
arguments are present (such as rank
or dispersion
). This
results in a much faster calculation, but has certain drawbacks, because the
fitted component model objects are not stored, and some methods (e.g.
predict
, fitted
, model.matrix
or vcov
) would not
be available with the returned object. Otherwise, get.models
is
called prior to averaging, and ... are passed to it.
For a list of model types that are accepted see list of supported models.
rank
is found by a call to match.funbase and typically is
specified as a function or a symbol or a character string specifying a
function to be searched for from the environment of the call to lapply.
rank
must be a function able to accept model as a first argument and
must always return a numeric scalar.
Several standard methods for fitted model objects exist for class
averaging
, including summary
,
predict=predict.averaging, coef
, confint
,
formula, and vcov.
coef
, vcov
, confint
and coefTable
accept argument
full
that if set to TRUE
, the full model-averaged coefficients
are returned, rather than subset-averaged ones (when full = FALSE
,
being the default).
logLik
returns a list of logLikstats objects
for the component models.
The ‘subset’ (or ‘conditional’) average only averages over the models where the parameter appears. An alternative, the ‘full’ average assumes that a variable is included in every model, but in some models the corresponding coefficient (and its respective variance) is set to zero. Unlike the ‘subset average’, it does not have a tendency of biasing the value away from zero. The ‘full’ average is a type of shrinkage estimator, and for variables with a weak relationship to the response it is smaller than ‘subset’ estimators.
Averaging models with different contrasts for the same factor would yield nonsense results. Currently, no checking for contrast consistency is done.
print
method provides a concise output (similarly as for lm
).
To print more details use summary
function, and confint
to get confidence intervals.
Burnham, K. P. and Anderson, D. R. 2002 Model selection and multimodel inference: a practical information-theoretic approach. 2nd ed. New York, Springer-Verlag.
Lukacs, P. M., Burnham K. P. and Anderson, D. R. 2009 Model selection bias and Freedman’s paradox. Annals of the Institute of Statistical Mathematics 62, 117–125.
See par.avg for more details of model-averaged parameter calculation.
dredge, get.models
AICc has examples of averaging models fitted by REML.
modavg
in package AICcmodavg, and
coef.glmulti
in package glmulti also perform model
averaging.
# Example from Burnham and Anderson (2002), page 100:
fm1 <- lm(y ~ ., data = Cement, na.action = na.fail)
(ms1 <- dredge(fm1))
#> Fixed term is "(Intercept)"
#> Global model call: lm(formula = y ~ ., data = Cement, na.action = na.fail)
#> ---
#> Model selection table
#> (Intrc) X1 X2 X3 X4 df logLik AICc delta weight
#> 4 52.58 1.468 0.6623 4 -28.156 69.3 0.00 0.566
#> 12 71.65 1.452 0.4161 -0.2365 5 -26.933 72.4 3.13 0.119
#> 8 48.19 1.696 0.6569 0.2500 5 -26.952 72.5 3.16 0.116
#> 10 103.10 1.440 -0.6140 4 -29.817 72.6 3.32 0.107
#> 14 111.70 1.052 -0.4100 -0.6428 5 -27.310 73.2 3.88 0.081
#> 15 203.60 -0.9234 -1.4480 -1.5570 5 -29.734 78.0 8.73 0.007
#> 16 62.41 1.551 0.5102 0.1019 -0.1441 6 -26.918 79.8 10.52 0.003
#> 13 131.30 -1.2000 -0.7246 4 -35.372 83.7 14.43 0.000
#> 7 72.07 0.7313 -1.0080 4 -40.965 94.9 25.62 0.000
#> 9 117.60 -0.7382 3 -45.872 100.4 31.10 0.000
#> 3 57.42 0.7891 3 -46.035 100.7 31.42 0.000
#> 11 94.16 0.3109 -0.4569 4 -45.761 104.5 35.21 0.000
#> 2 81.48 1.869 3 -48.206 105.1 35.77 0.000
#> 6 72.35 2.312 0.4945 4 -48.005 109.0 39.70 0.000
#> 5 110.20 -1.2560 3 -50.980 110.6 41.31 0.000
#> 1 95.42 2 -53.168 111.5 42.22 0.000
#> Models ranked by AICc(x)
# Use models with Delta AICc < 4
summary(model.avg(ms1, subset = delta < 4))
#>
#> Call:
#> model.avg(object = ms1, subset = delta < 4)
#>
#> Component model call:
#> lm(formula = y ~ <5 unique rhs>, data = Cement, na.action = na.fail)
#>
#> Component models:
#> df logLik AICc delta weight
#> 12 4 -28.16 69.31 0.00 0.57
#> 124 5 -26.93 72.44 3.13 0.12
#> 123 5 -26.95 72.48 3.16 0.12
#> 14 4 -29.82 72.63 3.32 0.11
#> 134 5 -27.31 73.19 3.88 0.08
#>
#> Term codes:
#> X1 X2 X3 X4
#> 1 2 3 4
#>
#> Model-averaged coefficients:
#> (full average)
#> Estimate Std. Error Adjusted SE z value Pr(>|z|)
#> (Intercept) 64.693128 22.235479 22.462414 2.880 0.00398 **
#> X1 1.455798 0.203668 0.219304 6.638 < 2e-16 ***
#> X2 0.505758 0.268371 0.271702 1.861 0.06268 .
#> X4 -0.147870 0.252517 0.255126 0.580 0.56219
#> X3 -0.004302 0.168612 0.175632 0.024 0.98046
#>
#> (conditional average)
#> Estimate Std. Error Adjusted SE z value Pr(>|z|)
#> (Intercept) 64.69313 22.23548 22.46241 2.880 0.00398 **
#> X1 1.45580 0.20367 0.21930 6.638 < 2e-16 ***
#> X2 0.62503 0.12026 0.12917 4.839 1.3e-06 ***
#> X4 -0.47601 0.22152 0.23094 2.061 0.03929 *
#> X3 -0.02153 0.37671 0.39244 0.055 0.95624
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#or as a 95% confidence set:
avgmod.95p <- model.avg(ms1, cumsum(weight) <= .95)
confint(avgmod.95p)
#> 2.5 % 97.5 %
#> (Intercept) 24.7841330 96.18447223
#> X1 1.1496363 1.83431889
#> X2 0.3718565 0.87819546
#> X4 -0.8878522 0.05587375
#> X3 -0.1678276 0.66786280
if (FALSE) { # \dontrun{
# The same result, but re-fitting the models via 'get.models'
confset.95p <- get.models(ms1, cumsum(weight) <= .95)
model.avg(confset.95p)
# Force re-fitting the component models
model.avg(ms1, cumsum(weight) <= .95, fit = TRUE)
# Models are also fitted if additional arguments are given
model.avg(ms1, cumsum(weight) <= .95, rank = "AIC")
} # }
if (FALSE) { # \dontrun{
# using BIC (Schwarz's Bayesian criterion) to rank the models
BIC <- function(x) AIC(x, k = log(length(residuals(x))))
model.avg(confset.95p, rank = BIC)
# the same result, using AIC directly, with argument k
# 'x' in a quoted 'rank' argument is substituted with a model object
# (in this case it does not make much sense as the number of observations is
# common to all models)
model.avg(confset.95p, rank = AIC, rank.args = alist(k = log(length(residuals(x)))))
} # }