Create nicely-spaced sets of nested interval widths for use with (e.g.)
the .width
parameter of point_interval()
, stat_slabinterval()
, or
stat_lineribbon()
:
interval_widths(n)
creates a sequence of n
interval widths
\(p_1 \ldots p_n\), where \(0 < p_i \le \textrm{max} < 1\), corresponding
to the masses of nested intervals that are evenly-spaced on a reference
distribution (by default a Normal distribution). This generalizes the idea
behind the default ~66% and 95% intervals in stat_slabinterval()
and
50%, 80%, and 95% intervals in stat_lineribbon()
: when applied to a Normal
distribution, those intervals are roughly evenly-spaced and allow one to
see deviations from the reference distribution (such as excess kurtosis) when
the resulting intervals are not evenly spaced.
pretty_widths(n)
is a variant of interval_widths()
with defaults for
max
and precision
that make the resulting intervals more human-readable,
for labeling purposes.
Intervals should be evenly-spaced on any symmetric reference distribution
when applied to data from distributions with the same shape. If dist
is not symmetric, intervals may only be approximately evenly-spaced above the
median.
interval_widths(n, dist = dist_normal(), max = 1 - 0.1/n, precision = NULL)
pretty_widths(
n,
dist = dist_normal(),
max = if (n <= 4) 0.95 else 1 - 0.1/n,
precision = if (n <= 4) 0.05 else 0.01
)
<numeric> in \([0, \infty)\): Number of intervals to generate.
<distribution>: Reference distribution.
<numeric> in \((0, 1)\): Maximum interval width.
<numeric | NULL>: If not NULL
, a value in \((0, 1)\)
giving the precision to round resulting widths to. In order to guarantee
n
unique intervals are returned, widths will only be rounded if the
result does not create duplicate values.
A length-n
numeric vector of interval widths (masses) between
0
and 1
(exclusive) in increasing order.
Given the cumulative distribution function \(F_\textrm{dist}(q)\)
and the quantile function \(F^{-1}_\textrm{dist}(p)\) of dist
, the
following is a sequence of \(n + 1\) evenly-spaced quantiles of dist
that could represent upper limits of nested intervals, where
\(q_i = q_0 + i\frac{q_n - q_0}{n}\):
$$\begin{array}{rcl} q_0, \ldots, q_n &=& F^{-1}_\textrm{dist}(0.5), \ldots, F^{-1}_\textrm{dist}(0.5 + \frac{\textrm{max}}{2}) \end{array}$$
interval_widths(n)
returns the n
interval widths corresponding to the
upper interval limits \(q_1, \ldots, q_n\):
$$ 2\cdot\left[F_\textrm{dist}(q_1) - 0.5\right], \ldots, 2\cdot\left[F_\textrm{dist}(q_n) - 0.5\right] $$
The .width
argument to point_interval()
, stat_slabinterval()
,
stat_lineribbon()
, etc.
library(ggplot2)
library(distributional)
interval_widths(1) # 0.9
#> [1] 0.9
# this is roughly +/- 1 SD and +/- 2 SD
interval_widths(2) # 0.672..., 0.95
#> [1] 0.672905 0.950000
interval_widths(3) # 0.521..., 0.844..., 0.966...
#> [1] 0.5218917 0.8440126 0.9666667
# "pretty" widths may be useful for legends with a small number of widths
pretty_widths(1) # 0.95
#> [1] 0.95
pretty_widths(2) # 0.65, 0.95
#> [1] 0.65 0.95
pretty_widths(3) # 0.50, 0.80, 0.95
#> [1] 0.50 0.80 0.95
# larger numbers of intervals can be useful for plots
ggplot(data.frame(x = 1:20/20)) +
aes(x, ydist = dist_normal((x * 5)^2, 1 + x * 5)) +
stat_lineribbon(.width = pretty_widths(10))
# large numbers of intervals can be used to create gradients -- particularly
# useful if you shade ribbons according to density (not interval width)
# (this is currently experimental)
withr::with_options(list(ggdist.experimental.slab_data_in_intervals = TRUE), print(
ggplot(data.frame(x = 1:20/20)) +
aes(x, ydist = dist_normal((x * 5)^2, 1 + x * 5)) +
stat_lineribbon(
aes(fill_ramp = after_stat(ave(pdf_min, level))),
.width = interval_widths(40),
fill = "gray50"
) +
theme_ggdist()
))