vignettes/model_parameters_robust.Rmd
model_parameters_robust.RmdThe model_parameters()
function also allows the computation of standard errors, confidence
intervals, and p-values based on various covariance matrices:
heteroskedasticity-consistent, cluster-robust, bootstrap, etc. This
functionality relies on the sandwich
and clubSandwich
packages. This means that all models supported by either of these
packages should work with model_parameters().
The computation of robust standard errors is controlled by two arguments:
vcov: accepts 3 types of arguments
stats::vcov())"vcovHC",
"HC", "HC0", "HC1",
"HC2", "HC3", "HC4",
"HC4m", "HC5"
"vcovCR", "CR0",
"CR1", "CR1p", "CR1S",
"CR2", "CR3"
"vcovBS", "xy",
"residual", "wild", "mammen",
"webb"
sandwich functions: "vcovHAC",
"vcovPC", "vcovCL", "vcovPL"
vcov_args: list of arguments passed to the
sandwich or clubSandwich function used to compute the
covariance matrix. See for example ?sandwich::vcovHAC
Let us start with a simple example, which uses a
heteroskedasticity-consistent covariance matrix estimation with
estimation-type “HC3” (i.e.
sandwich::vcovHC(type = "HC3")).
First let’s create a simple linear regression model, which we know violates homoscedasticity assumption, and thus robust estimation methods are to be considered.
data(cars)
model <- lm(dist ~ speed, data = cars)
library(performance)
check_heteroscedasticity(model)
#> Warning: Heteroscedasticity (non-constant error variance) detected (p = 0.031).We would extract model parameters both with and without robust estimation to highlight the difference it makes to standard errors, confidence intervals, t-statistic, and p-values. Also, note that the coefficient estimate remains unchanged.
# model parameters, where SE, CI, and p-values are *not* based on robust estimation
model_parameters(model)
#> Parameter | Coefficient | SE | 95% CI | t(48) | p
#> -------------------------------------------------------------------
#> (Intercept) | -17.58 | 6.76 | [-31.17, -3.99] | -2.60 | 0.012
#> speed | 3.93 | 0.42 | [ 3.10, 4.77] | 9.46 | < .001
# model parameters, where SE, CI, and p-values are based on robust estimation
mp <- model_parameters(model, vcov = "HC3")
mp
#> Parameter | Coefficient | SE | 95% CI | t(48) | p
#> -------------------------------------------------------------------
#> (Intercept) | -17.58 | 5.93 | [-29.51, -5.65] | -2.96 | 0.005
#> speed | 3.93 | 0.43 | [ 3.07, 4.79] | 9.20 | < .001
# compare standard errors to result from sandwich-package
mp$SE
#> [1] 5.93 0.43
unname(sqrt(diag(sandwich::vcovHC(model))))
#> [1] 5.93 0.43If a different type of covariance matrix estimation is required, use
the vcov-argument. This argument accepts the name of a
function from the sandwich or clubSandwich packages as
a string, such as "vcovCL" (or just its suffix
"CL"). parameters will then call the corresponding
function with the content of vcov_args as arguments.
The specific estimation type can be controlled by passing a
type argument via vcov_args. See
?sandwich::vcovCL for information about the different types
of covariance matrices that this function can produce (HC0
to HC3). In the next example, we use a clustered covariance
matrix estimation with HC1-estimation type.
# let's create a more complicated model
data(iris)
model <- lm(Petal.Length ~ Sepal.Length * Species + Sepal.Width, data = iris)
# change estimation-type
mp <- model_parameters(
model,
vcov = "CL", # type of covariance matrix
vcov_args = list(type = "HC1") # type of robust estimation
)
mp
#> Parameter | Coefficient | SE | 95% CI
#> ------------------------------------------------------------------------
#> (Intercept) | 0.87 | 0.42 | [ 0.03, 1.70]
#> Sepal Length | 0.04 | 0.11 | [-0.18, 0.26]
#> Species [versicolor] | -0.78 | 0.65 | [-2.07, 0.51]
#> Species [virginica] | -0.41 | 0.59 | [-1.57, 0.75]
#> Sepal Width | 0.11 | 0.08 | [-0.05, 0.27]
#> Sepal Length * Species [versicolor] | 0.61 | 0.12 | [ 0.37, 0.85]
#> Sepal Length * Species [virginica] | 0.68 | 0.11 | [ 0.46, 0.90]
#>
#> Parameter | t(143) | p
#> -----------------------------------------------------
#> (Intercept) | 2.05 | 0.042
#> Sepal Length | 0.40 | 0.692
#> Species [versicolor] | -1.19 | 0.237
#> Species [virginica] | -0.70 | 0.483
#> Sepal Width | 1.38 | 0.170
#> Sepal Length * Species [versicolor] | 4.96 | < .001
#> Sepal Length * Species [virginica] | 6.15 | < .001
# compare standard errors to result from sandwich-package
mp$SE
#> [1] 0.422 0.111 0.653 0.587 0.079 0.123 0.111
unname(sqrt(diag(sandwich::vcovCL(model))))
#> [1] 0.422 0.111 0.653 0.587 0.079 0.123 0.111Usually, clustered covariance matrix estimation is used when there is
a cluster-structure in the data. The variable indicating the
cluster-structure can be defined in sandwich::vcovCL() with
the cluster-argument. In model_parameters(),
additional arguments that should be passed down to functions from the
sandwich package can be specified in
vcov_args:
iris$cluster <- factor(rep(LETTERS[1:8], length.out = nrow(iris)))
# change estimation-type, defining additional arguments
mp <- model_parameters(
model,
vcov = "vcovCL",
vcov_args = list(type = "HC1", cluster = iris$cluster)
)
mp
#> Parameter | Coefficient | SE | 95% CI
#> ------------------------------------------------------------------------
#> (Intercept) | 0.87 | 0.34 | [ 0.20, 1.53]
#> Sepal Length | 0.04 | 0.07 | [-0.10, 0.19]
#> Species [versicolor] | -0.78 | 0.52 | [-1.80, 0.25]
#> Species [virginica] | -0.41 | 0.26 | [-0.94, 0.11]
#> Sepal Width | 0.11 | 0.07 | [-0.03, 0.25]
#> Sepal Length * Species [versicolor] | 0.61 | 0.10 | [ 0.42, 0.80]
#> Sepal Length * Species [virginica] | 0.68 | 0.05 | [ 0.58, 0.78]
#>
#> Parameter | t(143) | p
#> -----------------------------------------------------
#> (Intercept) | 2.57 | 0.011
#> Sepal Length | 0.61 | 0.540
#> Species [versicolor] | -1.49 | 0.137
#> Species [virginica] | -1.56 | 0.120
#> Sepal Width | 1.52 | 0.131
#> Sepal Length * Species [versicolor] | 6.29 | < .001
#> Sepal Length * Species [virginica] | 13.28 | < .001
# compare standard errors to result from sandwich-package
mp$SE
#> [1] 0.337 0.072 0.519 0.264 0.072 0.097 0.051
unname(sqrt(diag(sandwich::vcovCL(model, cluster = iris$cluster))))
#> [1] 0.337 0.072 0.519 0.264 0.072 0.097 0.051using clubSandwich::vcovCR(). Thus, when
vcov = "CR", the related function from the
clubSandwich package is called. Note that this function
requires the specification of the
cluster-argument.
# create fake-cluster-variable, to demonstrate cluster robust standard errors
iris$cluster <- factor(rep(LETTERS[1:8], length.out = nrow(iris)))
# cluster-robust estimation
mp <- model_parameters(
model,
vcov = "vcovCR",
vcov_args = list(type = "CR1", cluster = iris$cluster)
)
mp
#> Parameter | Coefficient | SE | 95% CI
#> ------------------------------------------------------------------------
#> (Intercept) | 0.87 | 0.33 | [ 0.21, 1.52]
#> Sepal Length | 0.04 | 0.07 | [-0.10, 0.18]
#> Species [versicolor] | -0.78 | 0.51 | [-1.78, 0.23]
#> Species [virginica] | -0.41 | 0.26 | [-0.92, 0.10]
#> Sepal Width | 0.11 | 0.07 | [-0.03, 0.25]
#> Sepal Length * Species [versicolor] | 0.61 | 0.09 | [ 0.42, 0.79]
#> Sepal Length * Species [virginica] | 0.68 | 0.05 | [ 0.58, 0.78]
#>
#> Parameter | t(143) | p
#> -----------------------------------------------------
#> (Intercept) | 2.62 | 0.010
#> Sepal Length | 0.63 | 0.531
#> Species [versicolor] | -1.53 | 0.129
#> Species [virginica] | -1.60 | 0.112
#> Sepal Width | 1.55 | 0.123
#> Sepal Length * Species [versicolor] | 6.42 | < .001
#> Sepal Length * Species [virginica] | 13.56 | < .001
# compare standard errors to result from clubSsandwich-package
mp$SE
#> [1] 0.330 0.070 0.508 0.259 0.071 0.095 0.050
unname(sqrt(diag(clubSandwich::vcovCR(model, type = "CR1", cluster = iris$cluster))))
#> [1] 0.330 0.070 0.508 0.259 0.071 0.095 0.050Finally, robust estimation can be combined with standardization.
However, robust covariance matrix estimation only works for
standardize = "refit".
# model parameters, robust estimation on standardized model
model_parameters(model, standardize = "refit", vcov = "HC3")
#> Parameter | Coefficient | SE | 95% CI
#> -------------------------------------------------------------------------
#> (Intercept) | -1.30 | 0.07 | [-1.44, -1.16]
#> Sepal Length | 0.02 | 0.06 | [-0.09, 0.13]
#> Species [versicolor] | 1.57 | 0.09 | [ 1.40, 1.74]
#> Species [virginica] | 2.02 | 0.09 | [ 1.84, 2.20]
#> Sepal Width | 0.03 | 0.02 | [-0.01, 0.07]
#> Sepal Length * Species [versicolor] | 0.28 | 0.06 | [ 0.16, 0.41]
#> Sepal Length * Species [virginica] | 0.32 | 0.06 | [ 0.21, 0.43]
#>
#> Parameter | t(143) | p
#> -----------------------------------------------------
#> (Intercept) | -18.70 | < .001
#> Sepal Length | 0.37 | 0.711
#> Species [versicolor] | 17.84 | < .001
#> Species [virginica] | 22.49 | < .001
#> Sepal Width | 1.32 | 0.190
#> Sepal Length * Species [versicolor] | 4.65 | < .001
#> Sepal Length * Species [virginica] | 5.75 | < .001For linear mixed-effects models, that by definition have a clustered (hierarchical or multilevel) structure in the data, it is also possible to estimate a cluster-robust covariance matrix. This is possible due to the clubSandwich package, thus we need to define the same arguments as in the above example.
library(lme4)
data(iris)
set.seed(1234)
iris$grp <- as.factor(sample(1:3, nrow(iris), replace = TRUE))
# fit example model
model <- lme4::lmer(
Sepal.Length ~ Species * Sepal.Width + Petal.Length + (1 | grp),
data = iris
)
# model parameters without robust estimation
model_parameters(model)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> ------------------------------------------------------------------------
#> (Intercept) | 1.55 | 0.40 | [ 0.76, 2.35]
#> Species [versicolor] | 0.41 | 0.55 | [-0.67, 1.50]
#> Species [virginica] | -0.41 | 0.58 | [-1.56, 0.74]
#> Sepal Width | 0.66 | 0.11 | [ 0.44, 0.89]
#> Petal Length | 0.82 | 0.07 | [ 0.69, 0.95]
#> Species [versicolor] * Sepal Width | -0.48 | 0.19 | [-0.85, -0.12]
#> Species [virginica] * Sepal Width | -0.36 | 0.18 | [-0.71, 0.00]
#>
#> Parameter | t(141) | p
#> ----------------------------------------------------
#> (Intercept) | 3.87 | < .001
#> Species [versicolor] | 0.75 | 0.454
#> Species [virginica] | -0.70 | 0.483
#> Sepal Width | 5.83 | < .001
#> Petal Length | 12.52 | < .001
#> Species [versicolor] * Sepal Width | -2.60 | 0.010
#> Species [virginica] * Sepal Width | -1.99 | 0.048
#>
#> # Random Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> -------------------------------------------------------
#> SD (Intercept: grp) | 0.08 | 0.05 | [0.02, 0.29]
#> SD (Residual) | 0.30 | 0.02 | [0.27, 0.33]
# model parameters with cluster robust estimation
model_parameters(
model,
vcov = "vcovCR",
vcov_args = list(type = "CR1", cluster = iris$grp)
)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> ------------------------------------------------------------------------
#> (Intercept) | 1.55 | 0.40 | [ 0.76, 2.35]
#> Species [versicolor] | 0.41 | 0.80 | [-1.17, 1.99]
#> Species [virginica] | -0.41 | 0.19 | [-0.78, -0.03]
#> Sepal Width | 0.66 | 0.10 | [ 0.46, 0.86]
#> Petal Length | 0.82 | 0.05 | [ 0.72, 0.91]
#> Species [versicolor] * Sepal Width | -0.48 | 0.35 | [-1.18, 0.21]
#> Species [virginica] * Sepal Width | -0.36 | 0.11 | [-0.57, -0.15]
#>
#> Parameter | t(141) | p
#> ----------------------------------------------------
#> (Intercept) | 3.87 | < .001
#> Species [versicolor] | 0.51 | 0.608
#> Species [virginica] | -2.15 | 0.033
#> Sepal Width | 6.64 | < .001
#> Petal Length | 17.27 | < .001
#> Species [versicolor] * Sepal Width | -1.37 | 0.172
#> Species [virginica] * Sepal Width | -3.39 | < .001
#>
#> # Random Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> -------------------------------------------------------
#> SD (Intercept: grp) | 0.08 | 0.05 | [0.02, 0.29]
#> SD (Residual) | 0.30 | 0.02 | [0.27, 0.33]Notice that robust estimation returns different standard errors, confidence intervals, test statistic and p-values compared to the standard estimation. Also, note that the coefficient estimate remains unchanged.
Once again, robust estimation can be combined with standardization
for linear mixed-effects models as well and works only with
standardize = "refit".
# model parameters, cluster robust estimation of standardized mixed model
model_parameters(
model,
standardize = "refit",
vcov = "vcovCR",
vcov_args = list(type = "CR1", cluster = iris$grp)
)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> ------------------------------------------------------------------------
#> (Intercept) | 0.97 | 0.08 | [ 0.82, 1.12]
#> Species [versicolor] | -1.29 | 0.33 | [-1.95, -0.63]
#> Species [virginica] | -1.81 | 0.23 | [-2.26, -1.37]
#> Sepal Width | 0.35 | 0.05 | [ 0.24, 0.45]
#> Petal Length | 1.74 | 0.10 | [ 1.54, 1.94]
#> Species [versicolor] * Sepal Width | -0.25 | 0.19 | [-0.62, 0.11]
#> Species [virginica] * Sepal Width | -0.19 | 0.06 | [-0.30, -0.08]
#>
#> Parameter | t(141) | p
#> ----------------------------------------------------
#> (Intercept) | 12.55 | < .001
#> Species [versicolor] | -3.85 | < .001
#> Species [virginica] | -8.01 | < .001
#> Sepal Width | 6.64 | < .001
#> Petal Length | 17.27 | < .001
#> Species [versicolor] * Sepal Width | -1.37 | 0.172
#> Species [virginica] * Sepal Width | -3.39 | < .001
#>
#> # Random Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> -------------------------------------------------------
#> SD (Intercept: grp) | 0.10 | 0.06 | [0.03, 0.35]
#> SD (Residual) | 0.36 | 0.02 | [0.32, 0.40]Notice how drastically some of the p-values change between robust-unstandardized model and robust-standardized model.