Obtain Grouped Statistics With summarytools

An adaptation base R's by function, designed to optimize the results' display.

stby(data, INDICES, FUN, ..., useNA = FALSE)

Arguments

data: an R object, normally a data frame, possibly a matrix.
INDICES: a grouping variable or a list of grouping variables, each of length nrow(data).
FUN: a function to be applied to (usually data-frame) subsets of data.
...: Further arguments to FUN.
useNA: Make NA a valid grouping value in INDICES variable(s). Set to FALSE explicitly to eliminate message.

Value

An object of classes “list” and “summarytools”, giving results for each subset.

Details

When the grouping variable(s) contain NA values, the base::by function (as well as summarytools versions prior to 1.1.0) ignores corresponding groups. Version 1.1.0 allows setting useNA = TRUE to make new groups using NA values on the grouping variable(s), just as dplyr::group_by does.

When NA values are detected and useNA = FALSE, a message is displayed; to disable this message, set check.nas = FALSE.

Examples

data("tobacco")
with(tobacco, stby(data = BMI, INDICES = gender, FUN = descr,
                   check.nas = FALSE))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> Descriptive Statistics  
#> BMI by gender  
#> Data Frame: tobacco  
#> N: 978  
#> 
#>                          F        M
#> ----------------- -------- --------
#>              Mean    26.10    25.31
#>           Std.Dev     4.95     3.98
#>               Min     9.01     8.83
#>                Q1    22.98    22.52
#>            Median    25.87    25.14
#>                Q3    29.48    27.96
#>               Max    39.44    36.76
#>               MAD     4.75     4.02
#>               IQR     6.49     5.44
#>                CV     0.19     0.16
#>          Skewness    -0.02    -0.04
#>       SE.Skewness     0.11     0.11
#>          Kurtosis     0.09     0.17
#>           N.Valid   475.00   477.00
#>                 N   489.00   489.00
#>         Pct.Valid    97.14    97.55
with(tobacco, stby(data = smoker, INDICES = gender, freq, useNA = TRUE))
#> Frequencies  
#> tobacco$smoker  
#> Type: Factor  
#> Group: gender = F  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes    147     30.06          30.06     30.06          30.06
#>          No    342     69.94         100.00     69.94         100.00
#>        <NA>      0                               0.00         100.00
#>       Total    489    100.00         100.00    100.00         100.00
#> 
#> Group: gender = M  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes    143     29.24          29.24     29.24          29.24
#>          No    346     70.76         100.00     70.76         100.00
#>        <NA>      0                               0.00         100.00
#>       Total    489    100.00         100.00    100.00         100.00
#> 
#> Group: gender = NA  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes      8     36.36          36.36     36.36          36.36
#>          No     14     63.64         100.00     63.64         100.00
#>        <NA>      0                               0.00         100.00
#>       Total     22    100.00         100.00    100.00         100.00
with(tobacco, stby(data = list(x = smoker, y = diseased),
                   INDICES = gender, FUN = ctable, useNA = TRUE))
#> Cross-Tabulation, Row Proportions  
#> smoker * diseased  
#> Data Frame: tobacco  
#> Group: gender = F  
#> 
#> -------- ---------- ------------- ------------- --------------
#>            diseased           Yes            No          Total
#>   smoker                                                      
#>      Yes               62 (42.2%)    85 (57.8%)   147 (100.0%)
#>       No               49 (14.3%)   293 (85.7%)   342 (100.0%)
#>    Total              111 (22.7%)   378 (77.3%)   489 (100.0%)
#> -------- ---------- ------------- ------------- --------------
#> 
#> Group: gender = M  
#> 
#> -------- ---------- ------------- ------------- --------------
#>            diseased           Yes            No          Total
#>   smoker                                                      
#>      Yes               63 (44.1%)    80 (55.9%)   143 (100.0%)
#>       No               47 (13.6%)   299 (86.4%)   346 (100.0%)
#>    Total              110 (22.5%)   379 (77.5%)   489 (100.0%)
#> -------- ---------- ------------- ------------- --------------
#> 
#> Group: gender = NA  
#> 
#> -------- ---------- ----------- ------------- -------------
#>            diseased         Yes            No         Total
#>   smoker                                                   
#>      Yes              0 ( 0.0%)    8 (100.0%)    8 (100.0%)
#>       No              3 (21.4%)   11 ( 78.6%)   14 (100.0%)
#>    Total              3 (13.6%)   19 ( 86.4%)   22 (100.0%)
#> -------- ---------- ----------- ------------- -------------

Obtain Grouped Statistics With summarytools

Arguments

Value

Details

See also

Examples