Run a contrast analysis by estimating the differences between each level of a
factor. See also other related functions such as estimate_means()
and estimate_slopes().
estimate_contrasts(model, ...)
# Default S3 method
estimate_contrasts(
model,
contrast = NULL,
by = NULL,
predict = NULL,
ci = 0.95,
comparison = "pairwise",
estimate = NULL,
p_adjust = "none",
transform = NULL,
keep_iterations = FALSE,
effectsize = NULL,
iterations = 200,
es_type = "cohens.d",
backend = NULL,
verbose = TRUE,
...
)A statistical model.
Other arguments passed, for instance, to insight::get_datagrid(),
to functions from the emmeans or marginaleffects package, or to process
Bayesian models via bayestestR::describe_posterior(). Examples:
insight::get_datagrid(): Argument such as length, digits or range
can be used to control the (number of) representative values. For integer
variables, protect_integers modulates whether these should also be
treated as numerics, i.e. values can have fractions or not.
marginaleffects: Internally used functions are avg_predictions() for
means and contrasts, and avg_slope() for slopes. Therefore, arguments for
instance like vcov, equivalence, df, slope, hypothesis or even
newdata can be passed to those functions. A weights argument is passed
to the wts argument in avg_predictions() or avg_slopes(), however,
weights can only be applied when estimate is "average" or
"population" (i.e. for those marginalization options that do not use data
grids). Other arguments, such as re.form or allow.new.levels, may be
passed to predict() (which is internally used by marginaleffects) if
supported by that model class.
emmeans: Internally used functions are emmeans() and emtrends().
Additional arguments can be passed to these functions.
Bayesian models: For Bayesian models, parameters are cleaned using
describe_posterior(), thus, arguments like, for example, centrality,
rope_range, or test are passed to that function.
Especially for estimate_contrasts() with integer focal predictors, for
which contrasts should be calculated, use argument integer_as_continuous
to set the maximum number of unique values in an integer predictor to treat
that predictor as "discrete integer" or as numeric. For the first case,
contrasts are calculated between values of the predictor, for the latter,
contrasts of slopes are calculated. If the integer has more than
integer_as_continuous unique values, it is treated as numeric. Defaults
to 5. Set to TRUE to always treat integer predictors as continuous.
For count regression models that use an offset term, use offset = <value>
to fix the offset at a specific value. Or use estimate = "average", to
average predictions over the distribution of the offset (if appropriate).
A character vector indicating the name of the variable(s) for
which to compute the contrasts, optionally including representative values or
levels at which contrasts are evaluated (e.g., contrast="x=c('a','b')").
The (focal) predictor variable(s) at which to evaluate the desired
effect / mean / contrasts. Other predictors of the model that are not
included here will be collapsed and "averaged" over (the effect will be
estimated across them). by can be a character (vector) naming the focal
predictors, optionally including representative values or levels at which
focal predictors are evaluated (e.g., by = "x = c(1, 2)"). When estimate
is not "average", the by argument is used to create a "reference grid"
or "data grid" with representative values for the focal predictors. In this
case, by can also be list of named elements. See details in
insight::get_datagrid() to learn more about how to create data grids for
predictors of interest.
Is passed to the type argument in emmeans::emmeans() (when
backend = "emmeans") or in marginaleffects::avg_predictions() (when
backend = "marginaleffects"). Valid options for predict are:
backend = "marginaleffects": predict can be "response", "link",
"inverse_link" or any valid type option supported by model's class
predict() method (e.g., for zero-inflation models from package
glmmTMB, you can choose predict = "zprob" or predict = "conditional"
etc., see glmmTMB::predict.glmmTMB). By default, when predict = NULL,
the most appropriate transformation is selected, which usually returns
predictions or contrasts on the response-scale. The "inverse_link" is a
special option, comparable to marginaleffects' invlink(link) option. It
will calculate predictions on the link scale and then back-transform to the
response scale.
backend = "emmeans": predict can be "response", "link", "mu",
"unlink", or "log". If predict = NULL (default), the most appropriate
transformation is selected (which usually is "response"). See also
this vignette.
See also section Predictions on different scales.
Confidence Interval (CI) level. Default to 0.95 (95%).
Specify the type of contrasts or tests that should be carried out.
When backend = "emmeans", can be one of "pairwise", "poly",
"consec", "eff", "del.eff", "mean_chg", "trt.vs.ctrl",
"dunnett", "wtcon" and some more. To test multiple hypotheses jointly
(usually used for factorial designs), comparison can also be "joint".
See also method argument in emmeans::contrast and the
?emmeans::emmc-functions.
For backend = "marginaleffects", can be a numeric value, vector, or
matrix, a string equation specifying the hypothesis to test, a string
naming the comparison method, a formula, or a function. For options not
described below, see documentation of marginaleffects::comparisons,
this website and
section Comparison options below.
String: One of "pairwise", "reference", "sequential", "meandev"
"meanotherdev", "poly", "helmert", or "trt_vs_ctrl". To test
multiple hypotheses jointly (usually used for factorial designs),
comparison can also be "joint". In this case, use the test argument
to specify which test should be conducted: "F" (default) or "Chi2".
String: Special string options are "inequality", "inequality_ratio",
and "inequality_pairwise". comparison = "inequality" computes the
marginal effect inequality summary of categorical predictors' overall
effects, respectively, the comprehensive effect of an independent
variable across all outcome categories of a nominal or ordinal dependent
variable (also called absolute inequality, or total marginal effect,
see Mize and Han, 2025). "inequality_ratio" computes the ratio of
marginal effect inequality measures, also known as relative inequality.
This is useful to compare the relative effects of different predictors on
the dependent variable. It provides a measure of how much more or less
inequality one predictor has compared to another.
comparison = "inequality_pairwise" computes pairwise differences of
absolute inequality measures, while "inequality_ratio_pairwise"
computes pairwise differences of relative inequality measures (ratios).
See an overview of applications in the related case study in the
vignettes.
String equation: To identify parameters from the output, either specify
the term name, or "b1", "b2" etc. to indicate rows, e.g.:"hp = drat",
"b1 = b2", or "b1 + b2 + b3 = 0".
Formula: A formula like comparison ~ pairs | group, where the left-hand
side indicates the type of comparison (difference or ratio), the
right-hand side determines the pairs of estimates to compare (reference,
sequential, meandev, etc., see string-options). Optionally, comparisons
can be carried out within subsets by indicating the grouping variable
after a vertical bar ( |).
A custom function, e.g. comparison = myfun, or
comparison ~ I(my_fun(x)) | groups.
If contrasts should be calculated (or grouped by) factors, comparison
can also be a matrix that specifies factor contrasts (see 'Examples').
The estimate argument determines how predictions are
averaged ("marginalized") over variables not specified in by or contrast
(non-focal predictors). It controls whether predictions represent a "typical"
individual, an "average" individual from the sample, or an "average"
individual from a broader population.
"typical" (Default): Calculates predictions for a balanced data grid
representing all combinations of focal predictor levels (specified in by).
For non-focal numeric predictors, it uses the mean; for non-focal
categorical predictors, it marginalizes (averages) over the levels. This
represents a "typical" observation based on the data grid and is useful for
comparing groups. It answers: "What would the average outcome be for a
'typical' observation?". This is the default approach when estimating
marginal means using the emmeans package.
"average": Calculates predictions for each observation in the sample and
then averages these predictions within each group defined by the focal
predictors. This reflects the sample's actual distribution of non-focal
predictors, not a balanced grid. It answers: "What is the predicted value
for an average observation in my data?"
"population": "Clones" each observation, creating copies with all
possible combinations of focal predictor levels. It then averages the
predictions across these "counterfactual" observations (non-observed
permutations) within each group. This extrapolates to a hypothetical
broader population, considering "what if" scenarios. It answers: "What is
the predicted response for the 'average' observation in a broader possible
target population?" This approach entails more assumptions about the
likelihood of different combinations, but can be more apt to generalize.
This is also the option that should be used for G-computation
(causal inference, see Chatton and Rohrer 2024). "counterfactual" is
an alias for "population".
You can set a default option for the estimate argument via options(),
e.g. options(modelbased_estimate = "average").
Note following limitations:
When you set estimate to "average", it calculates the average based
only on the data points that actually exist. This is in particular
important for two or more focal predictors, because it doesn't generate a
complete grid of all theoretical combinations of predictor values.
Consequently, the output may not include all the values.
Filtering the output at values of continuous predictors, e.g.
by = "x=1:5", in combination with estimate = "average" may result in
returning an empty data frame because of what was described above. In such
case, you can use estimate = "typical" or use the newdata argument to
provide a data grid of predictor values at which to evaluate predictions.
estimate = "population" is not available for estimate_slopes().
The p-values adjustment method for frequentist multiple
comparisons. Can be one of "none" (default), "hochberg", "hommel",
"bonferroni", "BH", "BY", "fdr", "tukey", "sidak", "sup-t",
"esarey" or "holm". The "esarey" option is specifically for the case of
Johnson-Neyman intervals, i.e. when calling estimate_slopes() with two
numeric predictors in an interaction term. "sup-t" computes simultaneous
confidence bands, also called sup-t confidence band (Montiel Olea &
Plagborg-Møller, 2019). Details for the other options can be found in the
p-value adjustment section of the emmeans::test documentation or
?stats::p.adjust. Note that certain options provided by the emmeans
package are only available if you set backend = "emmeans".
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., lm(log(y) ~ x)). For
Bayesian models, this function is applied to individual draws from the
posterior distribution, before computing summaries. Can also be TRUE, in
which case insight::get_transformation() is called to determine the
appropriate transformation-function. Note that no standard errors are returned
when transformations are applied.
If TRUE, will keep all iterations (draws) of
bootstrapped or Bayesian models. They will be added as additional columns
named iter_1, iter_2, and so on. If keep_iterations is a positive
number, only as many columns as indicated in keep_iterations will be added
to the output. You can reshape them to a long format by running
bayestestR::reshape_iterations().
Desired measure of standardized effect size, one of
"emmeans", "marginal", or "boot". Default is NULL, i.e. no effect
size will be computed.
The number of bootstrap resamples to perform.
Specifies the type of effect-size measure to estimate when
using effectsize = "boot". One of "unstandardized", "cohens.d",
"hedges.g", "cohens.d.sigma", "r", or "akp.robust.d". See effect.type argument of bootES::bootES for details.
Whether to use "marginaleffects" (default) or "emmeans" as
a backend. Results are usually very similar. The major difference will be
found for mixed models, where backend = "marginaleffects" will also average
across random effects levels, producing "marginal predictions" (instead of
"conditional predictions", see Heiss 2022).
Another difference is that backend = "marginaleffects" will be slower than
backend = "emmeans". For most models, this difference is negligible. However,
in particular complex models or large data sets fitted with glmmTMB can be
significantly slower.
You can set a default backend via options(), e.g. use
options(modelbased_backend = "emmeans") to use the emmeans package or
options(modelbased_backend = "marginaleffects") to set marginaleffects as
default backend.
Use FALSE to silence messages and warnings.
A data frame of estimated contrasts.
The estimate_slopes(), estimate_means() and estimate_contrasts()
functions are forming a group, as they are all based on marginal
estimations (estimations based on a model). All three are built on the
emmeans or marginaleffects package (depending on the backend
argument), so reading its documentation (for instance emmeans::emmeans(),
emmeans::emtrends() or this website) is
recommended to understand the idea behind these types of procedures.
Model-based predictions is the basis for all that follows. Indeed, the
first thing to understand is how models can be used to make predictions
(see estimate_relation()). This corresponds to the predicted response (or
"outcome variable") given specific predictor values of the predictors
(i.e., given a specific data configuration). This is why the concept of
the reference grid is so important for direct
predictions.
Marginal "means", obtained via estimate_means(), are an extension of
such predictions, allowing to "average" (collapse) some of the predictors,
to obtain the average response value at a specific predictors
configuration. This is typically used when some of the predictors of
interest are factors. Indeed, the parameters of the model will usually give
you the intercept value and then the "effect" of each factor level (how
different it is from the intercept). Marginal means can be used to directly
give you the mean value of the response variable at all the levels of a
factor. Moreover, it can also be used to control, or average over
predictors, which is useful in the case of multiple predictors with or
without interactions.
Marginal contrasts, obtained via estimate_contrasts(), are themselves
at extension of marginal means, in that they allow to investigate the
difference (i.e., the contrast) between the marginal means. This is, again,
often used to get all pairwise differences between all levels of a factor.
It works also for continuous predictors, for instance one could also be
interested in whether the difference at two extremes of a continuous
predictor is significant.
Finally, marginal effects, obtained via estimate_slopes(), are
different in that their focus is not values on the response variable, but
the model's parameters. The idea is to assess the effect of a predictor at
a specific configuration of the other predictors. This is relevant in the
case of interactions or non-linear relationships, when the effect of a
predictor variable changes depending on the other predictors. Moreover,
these effects can also be "averaged" over other predictors, to get for
instance the "general trend" of a predictor over different factor levels.
Example: Let's imagine the following model lm(y ~ condition * x) where
condition is a factor with 3 levels A, B and C and x a continuous
variable (like age for example). One idea is to see how this model performs,
and compare the actual response y to the one predicted by the model (using
estimate_expectation()). Another idea is evaluate the average mean at each of
the condition's levels (using estimate_means()), which can be useful to
visualize them. Another possibility is to evaluate the difference between
these levels (using estimate_contrasts()). Finally, one could also estimate
the effect of x averaged over all conditions, or instead within each
condition (using estimate_slopes()).
comparison = "pairwise": This method computes all possible unique
differences between pairs of levels of the focal predictor. For example, if
a factor has levels A, B, and C, it would compute A-B, A-C, and B-C.
comparison = "reference": This compares each level of the focal predictor
to a specified reference level (by default, the first level). For example,
if levels are A, B, C, and A is the reference, it computes B-A and C-A.
comparison = "sequential": This compares each level to the one
immediately following it in the factor's order. For levels A, B, C, it
would compute B-A and C-B.
comparison = "meandev": This contrasts each level's estimate against the
grand mean of all estimates for the focal predictor.
comparison = "meanotherdev": Similar to meandev, but each level's
estimate is compared against the mean of all other levels, excluding
itself.
comparison = "poly": These are used for ordered categorical variables to
test for linear, quadratic, cubic, etc., trends across the levels. They
assume equal spacing between levels.
comparison = "helmert": Contrast 2nd level to the first, 3rd to the
average of the first two, and so on. Each level (except the first) is
compared to the mean of the preceding levels. For levels A, B, C, it would
compute B-A and C-(A+B)/2.
comparison = "trt_vs_ctrl": This compares all levels (excluding the
first, which is typically the control) against the first level. It's often
used when comparing multiple treatment groups to a single control group.
To test multiple hypotheses jointly (usually used for factorial designs),
comparison can also be "joint". In this case, use the test argument
to specify which test should be conducted: "F" (default) or "Chi2".
comparison = "inequality" computes the absolute inequality of groups,
or in other words, the marginal effect inequality summary of categorical
predictors' overall effects, respectively, the comprehensive effect of an
independent variable across all outcome categories of a nominal or ordinal
dependent variable (total marginal effect, see Mize and Han, 2025). The
marginal effect inequality focuses on the heterogeneity of the effects of a
categorical independent variable. It helps understand how the effect of
the variable differs across its categories or levels. When the dependent
variable is categorical (e.g., logistic, ordinal or multinomial
regression), marginal effect inequality provides a holistic view of how an
independent variable affects a nominal or ordinal dependent variable. It
summarizes the overall impact (absolute inequality, or total marginal
effects) across all possible outcome categories.
comparison = "inequality_ratio" is comparable to
comparison = "inequality", but instead of calculating the absolute
inequality, it computes the relative inequality of groups. This is useful
to compare the relative effects of different predictors on the dependent
variable. It provides a measure of how much more or less inequality one
predictor has
compared to another.
comparison = "inequality_pairwise" computes pairwise differences of
absolute inequality measures, while "inequality_ratio_pairwise" computes
pairwise differences of relative inequality measures (ratios). Depending on
the sign, this measure indicates which of the predictors has a stronger
impact on the dependent variable in terms of inequalities.
Examples for analysing inequalities are shown in the related vignette.
By default, estimate_contrasts() reports no standardized effect size on
purpose. Should one request one, some things are to keep in mind. As the
authors of emmeans write, "There is substantial disagreement among
practitioners on what is the appropriate sigma to use in computing effect
sizes; or, indeed, whether any effect-size measure is appropriate for some
situations. The user is completely responsible for specifying appropriate
parameters (or for failing to do so)."
In particular, effect size method "boot" does not correct for covariates
in the model, so should probably only be used when there is just one
categorical predictor (with however many levels). Some believe that if there
are multiple predictors or any covariates, it is important to re-compute
sigma adding back in the response variance associated with the variables that
aren't part of the contrast.
effectsize = "emmeans" uses emmeans::eff_size with
sigma = stats::sigma(model), edf = stats::df.residual(model) and
method = "identity". This standardizes using the MSE (sigma). Some believe
this works when the contrasts are the only predictors in the model, but not
when there are covariates. The response variance accounted for by the
covariates should not be removed from the SD used to standardize. Otherwise,
d will be overestimated.
effectsize = "marginal" uses the following formula to compute effect
size: d_adj <- difference * (1- R2)/ sigma. This standardizes
using the response SD with only the between-groups variance on the focal
factor/contrast removed. This allows for groups to be equated on their
covariates, but creates an appropriate scale for standardizing the response.
effectsize = "boot" uses bootstrapping (defaults to a low value of
200) through bootES::bootES. Adjusts for contrasts, but not for covariates.
To define representative values for focal predictors (specified in by,
contrast, and trend), you can use several methods. These values are
internally generated by insight::get_datagrid(), so consult its
documentation for more details.
You can directly specify values as strings or lists for by, contrast,
and trend.
For numeric focal predictors, use examples like by = "gear = c(4, 8)",
by = list(gear = c(4, 8)) or by = "gear = 5:10"
For factor or character predictors, use by = "Species = c('setosa', 'virginica')"
or by = list(Species = c('setosa', 'virginica'))
You can use "shortcuts" within square brackets, such as by = "Sepal.Width = [sd]"
or by = "Sepal.Width = [fivenum]"
For numeric focal predictors, if no representative values are specified
(i.e., by = "gear" and not by = "gear = c(4, 8)"), length and
range control the number and type of representative values for the focal
predictors:
length determines how many equally spaced values are generated.
range specifies the type of values, like "range" or "sd".
length and range apply to all numeric focal predictors.
If you have multiple numeric predictors, length and range can accept
multiple elements, one for each predictor (see 'Examples').
For integer variables, only values that appear in the data will be included
in the data grid, independent from the length argument. This behaviour
can be changed by setting protect_integers = FALSE, which will then treat
integer variables as numerics (and possibly produce fractions).
See also this vignette for some examples.
The predict argument allows to generate predictions on different scales of
the response variable. The "link" option does not apply to all models, and
usually not to Gaussian models. "link" will leave the values on scale of
the linear predictors. "response" (or NULL) will transform them on scale
of the response variable. Thus for a logistic model, "link" will give
estimations expressed in log-odds (probabilities on logit scale) and
"response" in terms of probabilities.
To predict distributional parameters (called "dpar" in other packages), for
instance when using complex formulae in brms models, the predict argument
can take the value of the parameter you want to estimate, for instance
"sigma", "kappa", etc.
"response" and "inverse_link" both return predictions on the response
scale, however, "response" first calculates predictions on the response
scale for each observation and then aggregates them by groups or levels
defined in by. "inverse_link" first calculates predictions on the link
scale for each observation, then aggregates them by groups or levels defined
in by, and finally back-transforms the predictions to the response scale.
Both approaches have advantages and disadvantages. "response" usually
produces less biased predictions, but confidence intervals might be outside
reasonable bounds (i.e., for instance can be negative for count data). The
"inverse_link" approach is more robust in terms of confidence intervals,
but might produce biased predictions. However, you can try to set
bias_correction = TRUE, to adjust for this bias.
In particular for mixed models, using "response" is recommended, because
averaging across random effects groups is then more accurate.
Mize, T., & Han, B. (2025). Inequality and Total Effect Summary Measures for Nominal and Ordinal Variables. Sociological Science, 12, 115–157. doi:10.15195/v12.a7
Montiel Olea, J. L., and Plagborg-Møller, M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1–17. doi:10.1002/jae.2656
if (FALSE) { # \dontrun{
# Basic usage --------------------------------
# --------------------------------------------
model <- lm(Sepal.Width ~ Species, data = iris)
estimate_contrasts(model)
# Dealing with interactions
model <- lm(Sepal.Width ~ Species * Petal.Width, data = iris)
# By default: selects first factor
estimate_contrasts(model)
# Can also run contrasts between points of numeric, stratified by "Species"
estimate_contrasts(model, contrast = "Petal.Width", by = "Species")
# Or both
estimate_contrasts(model, contrast = c("Species", "Petal.Width"), length = 2)
# Or with custom specifications
estimate_contrasts(model, contrast = c("Species", "Petal.Width = c(1, 2)"))
# Or modulate it
estimate_contrasts(model, by = "Petal.Width", length = 4)
# Standardized differences
estimated <- estimate_contrasts(lm(Sepal.Width ~ Species, data = iris))
standardize(estimated)
# contrasts of slopes ------------------------
# --------------------------------------------
data(qol_cancer, package = "parameters")
qol_cancer$ID <- as.numeric(qol_cancer$ID)
qol_cancer$grp <- as.factor(ifelse(qol_cancer$ID < 100, "Group 1", "Group 2"))
model <- lm(QoL ~ time * education * grp, data = qol_cancer)
# "time" only has integer values and few values, so it's treated like a factor
estimate_contrasts(model, "time", by = "education")
# we set `integer_as_continuous = TRUE` to treat integer as continuous
estimate_contrasts(model, "time", by = "education", integer_as_continuous = 1)
# pairwise comparisons for multiple groups
estimate_contrasts(
model,
"time",
by = c("education", "grp"),
integer_as_continuous = TRUE
)
# if we want pairwise comparisons only for one factor, but group by another,
# we need the formula specification and define the grouping variable after
# the vertical bar
estimate_contrasts(
model,
"time",
by = c("education", "grp"),
comparison = ~pairwise | grp,
integer_as_continuous = TRUE
)
# custom factor contrasts - contrasts the average effects of two levels
# against the remaining third level
# ---------------------------------------------------------------------
data(puppy_love, package = "modelbased")
cond_tx <- cbind("no treatment" = c(1, 0, 0), "treatment" = c(0, 0.5, 0.5))
model <- lm(happiness ~ puppy_love * dose, data = puppy_love)
estimate_slopes(model, "puppy_love", by = "dose", comparison = cond_tx)
# Other models (mixed, Bayesian, ...) --------
# --------------------------------------------
data <- iris
data$Petal.Length_factor <- ifelse(data$Petal.Length < 4.2, "A", "B")
model <- lme4::lmer(Sepal.Width ~ Species + (1 | Petal.Length_factor), data = data)
estimate_contrasts(model)
data <- mtcars
data$cyl <- as.factor(data$cyl)
data$am <- as.factor(data$am)
model <- rstanarm::stan_glm(mpg ~ cyl * wt, data = data, refresh = 0)
estimate_contrasts(model)
estimate_contrasts(model, by = "wt", length = 4)
model <- rstanarm::stan_glm(
Sepal.Width ~ Species + Petal.Width + Petal.Length,
data = iris,
refresh = 0
)
estimate_contrasts(model, by = "Petal.Length = [sd]", test = "bf")
} # }