An adaptation base R's by
function, designed to
optimize the results' display.
stby(data, INDICES, FUN, ..., useNA = FALSE)
an R object, normally a data frame, possibly a matrix.
a grouping variable or a list of grouping variables,
each of length nrow(data)
.
a function to be applied to (usually data-frame) subsets of data.
Further arguments to FUN.
Make NA a valid grouping value in INDICES variable(s).
Set to FALSE
explicitly to eliminate message.
An object of classes “list” and “summarytools”, giving results for each subset.
When the grouping variable(s) contain NA values, the
base::by
function (as well as summarytools
versions prior to 1.1.0) ignores corresponding groups. Version 1.1.0
allows setting useNA = TRUE
to make new groups using
NA values on the grouping variable(s), just as
dplyr::group_by
does.
When NA values are detected and useNA = FALSE
, a message is
displayed; to disable this message, set check.nas = FALSE
.
data("tobacco")
with(tobacco, stby(data = BMI, INDICES = gender, FUN = descr,
check.nas = FALSE))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> Descriptive Statistics
#> BMI by gender
#> Data Frame: tobacco
#> N: 978
#>
#> F M
#> ----------------- -------- --------
#> Mean 26.10 25.31
#> Std.Dev 4.95 3.98
#> Min 9.01 8.83
#> Q1 22.98 22.52
#> Median 25.87 25.14
#> Q3 29.48 27.96
#> Max 39.44 36.76
#> MAD 4.75 4.02
#> IQR 6.49 5.44
#> CV 0.19 0.16
#> Skewness -0.02 -0.04
#> SE.Skewness 0.11 0.11
#> Kurtosis 0.09 0.17
#> N.Valid 475.00 477.00
#> N 489.00 489.00
#> Pct.Valid 97.14 97.55
with(tobacco, stby(data = smoker, INDICES = gender, freq, useNA = TRUE))
#> Frequencies
#> tobacco$smoker
#> Type: Factor
#> Group: gender = F
#>
#> Freq % Valid % Valid Cum. % Total % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#> Yes 147 30.06 30.06 30.06 30.06
#> No 342 69.94 100.00 69.94 100.00
#> <NA> 0 0.00 100.00
#> Total 489 100.00 100.00 100.00 100.00
#>
#> Group: gender = M
#>
#> Freq % Valid % Valid Cum. % Total % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#> Yes 143 29.24 29.24 29.24 29.24
#> No 346 70.76 100.00 70.76 100.00
#> <NA> 0 0.00 100.00
#> Total 489 100.00 100.00 100.00 100.00
#>
#> Group: gender = NA
#>
#> Freq % Valid % Valid Cum. % Total % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#> Yes 8 36.36 36.36 36.36 36.36
#> No 14 63.64 100.00 63.64 100.00
#> <NA> 0 0.00 100.00
#> Total 22 100.00 100.00 100.00 100.00
with(tobacco, stby(data = list(x = smoker, y = diseased),
INDICES = gender, FUN = ctable, useNA = TRUE))
#> Cross-Tabulation, Row Proportions
#> smoker * diseased
#> Data Frame: tobacco
#> Group: gender = F
#>
#> -------- ---------- ------------- ------------- --------------
#> diseased Yes No Total
#> smoker
#> Yes 62 (42.2%) 85 (57.8%) 147 (100.0%)
#> No 49 (14.3%) 293 (85.7%) 342 (100.0%)
#> Total 111 (22.7%) 378 (77.3%) 489 (100.0%)
#> -------- ---------- ------------- ------------- --------------
#>
#> Group: gender = M
#>
#> -------- ---------- ------------- ------------- --------------
#> diseased Yes No Total
#> smoker
#> Yes 63 (44.1%) 80 (55.9%) 143 (100.0%)
#> No 47 (13.6%) 299 (86.4%) 346 (100.0%)
#> Total 110 (22.5%) 379 (77.5%) 489 (100.0%)
#> -------- ---------- ------------- ------------- --------------
#>
#> Group: gender = NA
#>
#> -------- ---------- ----------- ------------- -------------
#> diseased Yes No Total
#> smoker
#> Yes 0 ( 0.0%) 8 (100.0%) 8 (100.0%)
#> No 3 (21.4%) 11 ( 78.6%) 14 (100.0%)
#> Total 3 (13.6%) 19 ( 86.4%) 22 (100.0%)
#> -------- ---------- ----------- ------------- -------------