adjboxStats.RdComputes the “statistics” for producing boxplots adjusted for
skewed distributions as proposed in Hubert and Vandervieren (2008),
see adjbox.
adjboxStats(x, coef = 1.5, a = -4, b = 3, do.conf = TRUE, do.out = TRUE,
...)a numeric vector for which adjusted boxplot statistics are computed.
number determining how far ‘whiskers’ extend out
from the box, see boxplot.stats.
scaling factors multiplied by the medcouple
mc() to determine outlyer boundaries; see the references.
logicals; if FALSE, the conf or
out component respectively will be empty in the result.
further optional arguments to be passed to
mc(), such as doReflect.
Given the quartiles \(Q_1\), \(Q_3\), the interquartile
range \(\Delta Q := Q_3 - Q_1\), and the medcouple
\(M :=\)mc(x), \(c =\)coef,
the “fence” is defined,
for \(M \ge 0\) as
$$[Q_1 - c e^{a \cdot M}\Delta Q, Q_3 + c e^{b \cdot M}\Delta Q],%
$$
and for \(M < 0\) as
$$[Q_1 - c e^{-b \cdot M}\Delta Q, Q_3 + c e^{-a \cdot M}\Delta Q],%
$$
and all observations x outside the fence, the “potential
outliers”, are returned in out.
Note that a typo in robustbase version up to 0.7-8, for the (rare left-skewed) case where mc(x) < 0, lead to a “fence” not wide enough in the upper part, and hence less outliers there.
A list with the components
a vector of length 5, containing the extreme of the lower whisker, the lower hinge, the median, the upper hinge and the extreme of the upper whisker.
the number of observations
the lower and upper extremes of the ‘notch’
(if(do.conf)). See boxplot.stats.
length 2 vector of interval boundaries which define the non-outliers, and hence the whiskers of the plot.
the values of any data points which lie beyond the fence, and hence beyond the extremes of the whiskers.
The code only slightly modifies the code of R's
boxplot.stats.
adjbox(), also for references,
the function which mainly uses this one;
further boxplot.stats.
data(condroz)
adjboxStats(ccA <- condroz[,"Ca"])
#> $stats
#> [1] 204.0 302.0 364.5 438.5 753.8
#>
#> $n
#> [1] 428
#>
#> $conf
#> [1] 354.0752 374.9248
#>
#> $fence
#> [1] 195.3720 772.4937
#>
#> $out
#> [1] 780.0 988.4 118.4 824.0 119.3 2251.1 3045.1 2383.1 3880.1 1423.5
#> [11] 100.7 2851.1 969.5 859.9 920.9
#>
adjboxStats(ccA, doReflect = TRUE)# small difference in fence
#> $stats
#> [1] 204.0 302.0 364.5 438.5 753.8
#>
#> $n
#> [1] 428
#>
#> $conf
#> [1] 354.0752 374.9248
#>
#> $fence
#> [1] 195.3898 772.5356
#>
#> $out
#> [1] 780.0 988.4 118.4 824.0 119.3 2251.1 3045.1 2383.1 3880.1 1423.5
#> [11] 100.7 2851.1 969.5 859.9 920.9
#>
## Test reflection invariance [was not ok, up to and including robustbase_0.7-8]
a1 <- adjboxStats( ccA, doReflect = TRUE)
a2 <- adjboxStats(-ccA, doReflect = TRUE)
nm1 <- c("stats", "conf", "fence")
stopifnot(all.equal( a1[nm1],
lapply(a2[nm1], function(u) rev(-u))),
all.equal(a1[["out"]], -a2[["out"]]))