Calculates mean, sd, min, Q1\*, median, Q3\*, max, MAD, IQR\*, CV, skewness\*, SE.skewness\*, and kurtosis\* on numerical vectors. (\*) Not available when using sampling weights.

descr(
  x,
  var = NULL,
  stats = st_options("descr.stats"),
  na.rm = TRUE,
  round.digits = st_options("round.digits"),
  transpose = st_options("descr.transpose"),
  order = "sort",
  style = st_options("style"),
  plain.ascii = st_options("plain.ascii"),
  justify = "r",
  headings = st_options("headings"),
  display.labels = st_options("display.labels"),
  split.tables = 100,
  weights = NULL,
  rescale.weights = FALSE,
  ...
)

Arguments

x

A numerical vector or a data frame.

var

Unquoted expression referring to a specific column in x. Provides support for piped function calls (e.g. my_df |> descr(my_var).

stats

Character. Which stats to produce. Either “all” (default), “fivenum”, “common” (see Details), or a selection of : “mean”, “sd”, “min”, “q1”, “med”, “q3”, “max”, “mad”, “iqr”, “cv”, “skewness”, “se.skewness”, “kurtosis”, “n.valid”, “n”, and “pct.valid”. Can be set globally via st_options, option “descr.stats”. See Details.

na.rm

Logical. Argument to be passed to statistical functions. Defaults to TRUE.

round.digits

Numeric. Number of significant digits to display. Defaults to 2. Can be set globally with st_options.

transpose

Logical. Make variables appears as columns, and stats as rows. Defaults to FALSE. Can be set globally with st_options, option “descr.transpose”.

order

Character. When analyzing more than one variable, this parameter determines how to order variables. Valid values are “sort” (or simply “s”), “preserve” (or “p”), or a vector containing all variable names in the desired order. Defaults to “sort”.

style

Character. Style to be used by pander. One of “simple” (default), “grid”, “rmarkdown”, or “jira”. Can be set globally with st_options.

plain.ascii

Logical. pander argument; when TRUE (default), no markup characters will be used (useful when printing to console). If style = 'rmarkdown' is specified, value is set to FALSE automatically. Can be set globally using st_options.

justify

Character. Alignment of numbers in cells; “l” for left, “c” for center, or “r” for right (default). Has no effect on html tables.

headings

Logical. Set to FALSE to omit heading section. Can be set globally via st_options. TRUE by default.

display.labels

Logical. Show variable / data frame labels in heading section. Defaults to TRUE. Can be set globally with st_options.

split.tables

Character. pander argument that specifies how many characters wide a table can be. 100 by default.

weights

Numeric. Vector of weights having same length as x. NULL (default) indicates that no weights are used.

rescale.weights

Logical. When set to TRUE, a global constant is apply to make the total count equal nrow(x). FALSE by default.

...

Additional arguments passed to pander or format.

Value

An object having classes “matrix” and “summarytools” containing the statistics, with extra attributes useful to other functions/methods.

Details

Since version 1.1, the stats argument can be set in a more flexible way; keywords (all, common, fivenum) can be combined with single statistics, or their “negation”. For instance, using stats = c("all", "-q1", "-q3") would show all except q1 and q3.

For further customization, you could redefine any preset in the following manner: .st_env$descr.stats$common <- c("mean", "sd", "n"). Use caution when modifying .st_env, and reload the package if errors ensue. Changes are temporary and will not persist across R sessions.

Author

Dominic Comtois, dominic.comtois@gmail.com

Examples

data("exams")

# All stats (default behavior) for all numerical variables
descr(exams)
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics  
#> exams  
#> N: 30  
#> 
#>                     economics   english   french   geography   history    math
#> ----------------- ----------- --------- -------- ----------- --------- -------
#>              Mean       73.91     75.96    73.94       70.04     72.77   73.54
#>           Std.Dev        8.62      7.92    10.79       10.65     10.20    9.19
#>               Min       60.50     58.30    44.80       47.20     53.90   55.60
#>                Q1       68.80     70.90    68.20       65.90     68.20   66.95
#>            Median       71.60     74.10    73.60       68.50     72.75   73.75
#>                Q3       77.00     80.60    76.70       77.80     76.50   80.35
#>               Max       94.20     93.10    94.70       96.30     93.50   93.20
#>               MAD        5.49      6.52     7.56       12.31      6.45    9.93
#>               IQR        8.20      9.70     8.50       11.90      8.15   13.35
#>                CV        0.12      0.10     0.15        0.15      0.14    0.12
#>          Skewness        0.75      0.28     0.03        0.10      0.01    0.12
#>       SE.Skewness        0.43      0.43     0.43        0.43      0.43    0.44
#>          Kurtosis       -0.42     -0.25     0.45       -0.03     -0.60   -0.58
#>           N.Valid       29.00     29.00    29.00       29.00     30.00   28.00
#>                 N       30.00     30.00    30.00       30.00     30.00   30.00
#>         Pct.Valid       96.67     96.67    96.67       96.67    100.00   93.33

# Show only "common" statistics, plus "n"
descr(exams, stats = c("common", "n"))
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics  
#> exams  
#> N: 30  
#> 
#>                   economics   english   french   geography   history    math
#> --------------- ----------- --------- -------- ----------- --------- -------
#>            Mean       73.91     75.96    73.94       70.04     72.77   73.54
#>         Std.Dev        8.62      7.92    10.79       10.65     10.20    9.19
#>             Min       60.50     58.30    44.80       47.20     53.90   55.60
#>          Median       71.60     74.10    73.60       68.50     72.75   73.75
#>             Max       94.20     93.10    94.70       96.30     93.50   93.20
#>         N.Valid       29.00     29.00    29.00       29.00     30.00   28.00
#>               N       30.00     30.00    30.00       30.00     30.00   30.00
#>       Pct.Valid       96.67     96.67    96.67       96.67    100.00   93.33

# Selection of statistics, transposing the results
descr(exams, stats = c("mean", "sd", "min", "max"), transpose = TRUE)
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics  
#> exams  
#> N: 30  
#> 
#>                    Mean   Std.Dev     Min     Max
#> --------------- ------- --------- ------- -------
#>       economics   73.91      8.62   60.50   94.20
#>         english   75.96      7.92   58.30   93.10
#>          french   73.94     10.79   44.80   94.70
#>       geography   70.04     10.65   47.20   96.30
#>         history   72.77     10.20   53.90   93.50
#>            math   73.54      9.19   55.60   93.20

# Rmarkdown-ready
descr(exams, plain.ascii = FALSE, style = "rmarkdown")
#> Non-numerical variable(s) ignored: student, gender
#> ### Descriptive Statistics  
#> #### exams  
#> **N:** 30  
#> 
#> |          &nbsp; | economics | english | french | geography | history |  math |
#> |----------------:|----------:|--------:|-------:|----------:|--------:|------:|
#> |        **Mean** |     73.91 |   75.96 |  73.94 |     70.04 |   72.77 | 73.54 |
#> |     **Std.Dev** |      8.62 |    7.92 |  10.79 |     10.65 |   10.20 |  9.19 |
#> |         **Min** |     60.50 |   58.30 |  44.80 |     47.20 |   53.90 | 55.60 |
#> |          **Q1** |     68.80 |   70.90 |  68.20 |     65.90 |   68.20 | 66.95 |
#> |      **Median** |     71.60 |   74.10 |  73.60 |     68.50 |   72.75 | 73.75 |
#> |          **Q3** |     77.00 |   80.60 |  76.70 |     77.80 |   76.50 | 80.35 |
#> |         **Max** |     94.20 |   93.10 |  94.70 |     96.30 |   93.50 | 93.20 |
#> |         **MAD** |      5.49 |    6.52 |   7.56 |     12.31 |    6.45 |  9.93 |
#> |         **IQR** |      8.20 |    9.70 |   8.50 |     11.90 |    8.15 | 13.35 |
#> |          **CV** |      0.12 |    0.10 |   0.15 |      0.15 |    0.14 |  0.12 |
#> |    **Skewness** |      0.75 |    0.28 |   0.03 |      0.10 |    0.01 |  0.12 |
#> | **SE.Skewness** |      0.43 |    0.43 |   0.43 |      0.43 |    0.43 |  0.44 |
#> |    **Kurtosis** |     -0.42 |   -0.25 |   0.45 |     -0.03 |   -0.60 | -0.58 |
#> |     **N.Valid** |     29.00 |   29.00 |  29.00 |     29.00 |   30.00 | 28.00 |
#> |           **N** |     30.00 |   30.00 |  30.00 |     30.00 |   30.00 | 30.00 |
#> |   **Pct.Valid** |     96.67 |   96.67 |  96.67 |     96.67 |  100.00 | 93.33 |

# Grouped statistics
data("tobacco")
with(tobacco, stby(BMI, gender, descr, check.nas = FALSE))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> Descriptive Statistics  
#> BMI by gender  
#> Data Frame: tobacco  
#> N: 978  
#> 
#>                          F        M
#> ----------------- -------- --------
#>              Mean    26.10    25.31
#>           Std.Dev     4.95     3.98
#>               Min     9.01     8.83
#>                Q1    22.98    22.52
#>            Median    25.87    25.14
#>                Q3    29.48    27.96
#>               Max    39.44    36.76
#>               MAD     4.75     4.02
#>               IQR     6.49     5.44
#>                CV     0.19     0.16
#>          Skewness    -0.02    -0.04
#>       SE.Skewness     0.11     0.11
#>          Kurtosis     0.09     0.17
#>           N.Valid   475.00   477.00
#>                 N   489.00   489.00
#>         Pct.Valid    97.14    97.55

# Grouped statistics in tidy table:
tb(with(tobacco, stby(BMI, age.gr, descr, stats = "common")))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> # A tibble: 4 × 10
#>   age.gr variable  mean    sd   min   med   max n.valid     n pct.valid
#>   <fct>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>     <dbl>
#> 1 18-34  BMI       23.8  4.23  8.83  24.0  34.8     252   258      97.7
#> 2 35-50  BMI       25.1  4.34 10.3   25.1  39.4     232   241      96.3
#> 3 51-70  BMI       26.9  4.26  9.01  26.8  39.2     312   317      98.4
#> 4 71 +   BMI       27.4  4.37 16.4   27.5  38.4     155   159      97.5

if (FALSE) { # \dontrun{
# Show in Viewer (or browser if not in RStudio)
view(descr(exams))

# Save to html file with title
print(descr(exams),
      file = "descr_exams.html", 
      report.title = "BMI by Age Group",
      footnote = "<b>Schoolyear:</b> 2018-2019<br/><b>Semester:</b> Fall")
} # }