Univariate Statistics for Numerical Data

Calculates mean, sd, min, Q1\*, median, Q3\*, max, MAD, IQR\*, CV, skewness\*, SE.skewness\*, and kurtosis\* on numerical vectors. (\*) Not available when using sampling weights.

descr(
  x,
  var = NULL,
  stats = st_options("descr.stats"),
  na.rm = TRUE,
  round.digits = st_options("round.digits"),
  transpose = st_options("descr.transpose"),
  order = "sort",
  style = st_options("style"),
  plain.ascii = st_options("plain.ascii"),
  justify = "r",
  headings = st_options("headings"),
  display.labels = st_options("display.labels"),
  split.tables = 100,
  weights = NULL,
  rescale.weights = FALSE,
  ...
)

Arguments

x: A numerical vector or a data frame.
var: Unquoted expression referring to a specific column in x. Provides support for piped function calls (e.g. my_df |> descr(my_var).
stats: Character. Which stats to produce. Either “all” (default), “fivenum”, “common” (see Details), or a selection of : “mean”, “sd”, “min”, “q1”, “med”, “q3”, “max”, “mad”, “iqr”, “cv”, “skewness”, “se.skewness”, “kurtosis”, “n.valid”, “n”, and “pct.valid”. Can be set globally via st_options, option “descr.stats”. See Details.
na.rm: Logical. Argument to be passed to statistical functions. Defaults to TRUE.
round.digits: Numeric. Number of significant digits to display. Defaults to 2. Can be set globally with st_options.
transpose: Logical. Make variables appears as columns, and stats as rows. Defaults to FALSE. Can be set globally with st_options, option “descr.transpose”.
order: Character. When analyzing more than one variable, this parameter determines how to order variables. Valid values are “sort” (or simply “s”), “preserve” (or “p”), or a vector containing all variable names in the desired order. Defaults to “sort”.
style: Character. Style to be used by pander. One of “simple” (default), “grid”, “rmarkdown”, or “jira”. Can be set globally with st_options.
plain.ascii: Logical. pander argument; when TRUE (default), no markup characters will be used (useful when printing to console). If style = 'rmarkdown' is specified, value is set to FALSE automatically. Can be set globally using st_options.
justify: Character. Alignment of numbers in cells; “l” for left, “c” for center, or “r” for right (default). Has no effect on html tables.
headings: Logical. Set to FALSE to omit heading section. Can be set globally via st_options. TRUE by default.
display.labels: Logical. Show variable / data frame labels in heading section. Defaults to TRUE. Can be set globally with st_options.
split.tables: Character. pander argument that specifies how many characters wide a table can be. 100 by default.
weights: Numeric. Vector of weights having same length as x. NULL (default) indicates that no weights are used.
rescale.weights: Logical. When set to TRUE, a global constant is apply to make the total count equal nrow(x). FALSE by default.
...: Additional arguments passed to pander or format.

Value

An object having classes “matrix” and “summarytools” containing the statistics, with extra attributes useful to other functions/methods.

Details

Since version 1.1, the stats argument can be set in a more flexible way; keywords (all, common, fivenum) can be combined with single statistics, or their “negation”. For instance, using stats = c("all", "-q1", "-q3") would show all except q1 and q3.

For further customization, you could redefine any preset in the following manner: .st_env$descr.stats$common <- c("mean", "sd", "n"). Use caution when modifying .st_env, and reload the package if errors ensue. Changes are temporary and will not persist across R sessions.

Author

Dominic Comtois, dominic.comtois@gmail.com

Examples

data("exams")

# All stats (default behavior) for all numerical variables
descr(exams)
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics  
#> exams  
#> N: 30  
#> 
#>                     economics   english   french   geography   history    math
#> ----------------- ----------- --------- -------- ----------- --------- -------
#>              Mean       73.91     75.96    73.94       70.04     72.77   73.54
#>           Std.Dev        8.62      7.92    10.79       10.65     10.20    9.19
#>               Min       60.50     58.30    44.80       47.20     53.90   55.60
#>                Q1       68.80     70.90    68.20       65.90     68.20   66.95
#>            Median       71.60     74.10    73.60       68.50     72.75   73.75
#>                Q3       77.00     80.60    76.70       77.80     76.50   80.35
#>               Max       94.20     93.10    94.70       96.30     93.50   93.20
#>               MAD        5.49      6.52     7.56       12.31      6.45    9.93
#>               IQR        8.20      9.70     8.50       11.90      8.15   13.35
#>                CV        0.12      0.10     0.15        0.15      0.14    0.12
#>          Skewness        0.75      0.28     0.03        0.10      0.01    0.12
#>       SE.Skewness        0.43      0.43     0.43        0.43      0.43    0.44
#>          Kurtosis       -0.42     -0.25     0.45       -0.03     -0.60   -0.58
#>           N.Valid       29.00     29.00    29.00       29.00     30.00   28.00
#>                 N       30.00     30.00    30.00       30.00     30.00   30.00
#>         Pct.Valid       96.67     96.67    96.67       96.67    100.00   93.33

# Show only "common" statistics, plus "n"
descr(exams, stats = c("common", "n"))
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics  
#> exams  
#> N: 30  
#> 
#>                   economics   english   french   geography   history    math
#> --------------- ----------- --------- -------- ----------- --------- -------
#>            Mean       73.91     75.96    73.94       70.04     72.77   73.54
#>         Std.Dev        8.62      7.92    10.79       10.65     10.20    9.19
#>             Min       60.50     58.30    44.80       47.20     53.90   55.60
#>          Median       71.60     74.10    73.60       68.50     72.75   73.75
#>             Max       94.20     93.10    94.70       96.30     93.50   93.20
#>         N.Valid       29.00     29.00    29.00       29.00     30.00   28.00
#>               N       30.00     30.00    30.00       30.00     30.00   30.00
#>       Pct.Valid       96.67     96.67    96.67       96.67    100.00   93.33

# Selection of statistics, transposing the results
descr(exams, stats = c("mean", "sd", "min", "max"), transpose = TRUE)
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics  
#> exams  
#> N: 30  
#> 
#>                    Mean   Std.Dev     Min     Max
#> --------------- ------- --------- ------- -------
#>       economics   73.91      8.62   60.50   94.20
#>         english   75.96      7.92   58.30   93.10
#>          french   73.94     10.79   44.80   94.70
#>       geography   70.04     10.65   47.20   96.30
#>         history   72.77     10.20   53.90   93.50
#>            math   73.54      9.19   55.60   93.20

# Rmarkdown-ready
descr(exams, plain.ascii = FALSE, style = "rmarkdown")
#> Non-numerical variable(s) ignored: student, gender
#> ### Descriptive Statistics  
#> #### exams  
#> **N:** 30  
#> 
#> |          &nbsp; | economics | english | french | geography | history |  math |
#> |----------------:|----------:|--------:|-------:|----------:|--------:|------:|
#> |        **Mean** |     73.91 |   75.96 |  73.94 |     70.04 |   72.77 | 73.54 |
#> |     **Std.Dev** |      8.62 |    7.92 |  10.79 |     10.65 |   10.20 |  9.19 |
#> |         **Min** |     60.50 |   58.30 |  44.80 |     47.20 |   53.90 | 55.60 |
#> |          **Q1** |     68.80 |   70.90 |  68.20 |     65.90 |   68.20 | 66.95 |
#> |      **Median** |     71.60 |   74.10 |  73.60 |     68.50 |   72.75 | 73.75 |
#> |          **Q3** |     77.00 |   80.60 |  76.70 |     77.80 |   76.50 | 80.35 |
#> |         **Max** |     94.20 |   93.10 |  94.70 |     96.30 |   93.50 | 93.20 |
#> |         **MAD** |      5.49 |    6.52 |   7.56 |     12.31 |    6.45 |  9.93 |
#> |         **IQR** |      8.20 |    9.70 |   8.50 |     11.90 |    8.15 | 13.35 |
#> |          **CV** |      0.12 |    0.10 |   0.15 |      0.15 |    0.14 |  0.12 |
#> |    **Skewness** |      0.75 |    0.28 |   0.03 |      0.10 |    0.01 |  0.12 |
#> | **SE.Skewness** |      0.43 |    0.43 |   0.43 |      0.43 |    0.43 |  0.44 |
#> |    **Kurtosis** |     -0.42 |   -0.25 |   0.45 |     -0.03 |   -0.60 | -0.58 |
#> |     **N.Valid** |     29.00 |   29.00 |  29.00 |     29.00 |   30.00 | 28.00 |
#> |           **N** |     30.00 |   30.00 |  30.00 |     30.00 |   30.00 | 30.00 |
#> |   **Pct.Valid** |     96.67 |   96.67 |  96.67 |     96.67 |  100.00 | 93.33 |

# Grouped statistics
data("tobacco")
with(tobacco, stby(BMI, gender, descr, check.nas = FALSE))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> Descriptive Statistics  
#> BMI by gender  
#> Data Frame: tobacco  
#> N: 978  
#> 
#>                          F        M
#> ----------------- -------- --------
#>              Mean    26.10    25.31
#>           Std.Dev     4.95     3.98
#>               Min     9.01     8.83
#>                Q1    22.98    22.52
#>            Median    25.87    25.14
#>                Q3    29.48    27.96
#>               Max    39.44    36.76
#>               MAD     4.75     4.02
#>               IQR     6.49     5.44
#>                CV     0.19     0.16
#>          Skewness    -0.02    -0.04
#>       SE.Skewness     0.11     0.11
#>          Kurtosis     0.09     0.17
#>           N.Valid   475.00   477.00
#>                 N   489.00   489.00
#>         Pct.Valid    97.14    97.55

# Grouped statistics in tidy table:
tb(with(tobacco, stby(BMI, age.gr, descr, stats = "common")))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> # A tibble: 4 × 10
#>   age.gr variable  mean    sd   min   med   max n.valid     n pct.valid
#>   <fct>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>     <dbl>
#> 1 18-34  BMI       23.8  4.23  8.83  24.0  34.8     252   258      97.7
#> 2 35-50  BMI       25.1  4.34 10.3   25.1  39.4     232   241      96.3
#> 3 51-70  BMI       26.9  4.26  9.01  26.8  39.2     312   317      98.4
#> 4 71 +   BMI       27.4  4.37 16.4   27.5  38.4     155   159      97.5

if (FALSE) { # \dontrun{
# Show in Viewer (or browser if not in RStudio)
view(descr(exams))

# Save to html file with title
print(descr(exams),
      file = "descr_exams.html", 
      report.title = "BMI by Age Group",
      footnote = "<b>Schoolyear:</b> 2018-2019<br/><b>Semester:</b> Fall")
} # }