Calculates mean, sd, min, Q1\*, median, Q3\*, max, MAD, IQR\*, CV, skewness\*, SE.skewness\*, and kurtosis\* on numerical vectors. (\*) Not available when using sampling weights.
descr(
x,
var = NULL,
stats = st_options("descr.stats"),
na.rm = TRUE,
round.digits = st_options("round.digits"),
transpose = st_options("descr.transpose"),
order = "sort",
style = st_options("style"),
plain.ascii = st_options("plain.ascii"),
justify = "r",
headings = st_options("headings"),
display.labels = st_options("display.labels"),
split.tables = 100,
weights = NULL,
rescale.weights = FALSE,
...
)
A numerical vector or a data frame.
Unquoted expression referring to a specific column in x
.
Provides support for piped function calls (e.g.
my_df |> descr(my_var)
.
Character. Which stats to produce. Either “all” (default),
“fivenum”, “common” (see Details), or a selection of :
“mean”, “sd”, “min”, “q1”, “med”,
“q3”, “max”, “mad”, “iqr”, “cv”,
“skewness”, “se.skewness”, “kurtosis”,
“n.valid”, “n”, and “pct.valid”. Can be set globally
via st_options
, option “descr.stats”. See
Details.
Logical. Argument to be passed to statistical functions.
Defaults to TRUE
.
Numeric. Number of significant digits to display.
Defaults to 2
. Can be set globally with st_options
.
Logical. Make variables appears as columns, and stats as
rows. Defaults to FALSE
. Can be set globally with
st_options
, option “descr.transpose”.
Character. When analyzing more than one variable, this parameter determines how to order variables. Valid values are “sort” (or simply “s”), “preserve” (or “p”), or a vector containing all variable names in the desired order. Defaults to “sort”.
Character. Style to be used by pander
. One
of “simple” (default), “grid”, “rmarkdown”, or
“jira”. Can be set globally with st_options
.
Logical. pander
argument; when
TRUE
(default), no markup characters will be used (useful when
printing to console). If style = 'rmarkdown'
is specified, value
is set to FALSE
automatically. Can be set globally using
st_options
.
Character. Alignment of numbers in cells; “l” for left, “c” for center, or “r” for right (default). Has no effect on html tables.
Logical. Set to FALSE
to omit heading section. Can be
set globally via st_options
. TRUE
by default.
Logical. Show variable / data frame labels in heading
section. Defaults to TRUE
. Can be set globally with
st_options
.
Character. pander
argument that
specifies how many characters wide a table can be. 100
by default.
Numeric. Vector of weights having same length as x.
NULL
(default) indicates that no weights are used.
Logical. When set to TRUE
, a global constant is
apply to make the total count equal nrow(x)
. FALSE
by default.
An object having classes “matrix” and “summarytools” containing the statistics, with extra attributes useful to other functions/methods.
Since version 1.1, the stats argument can be set in a more flexible
way; keywords (all, common, fivenum) can be combined
with single statistics, or their “negation”. For instance, using
stats = c("all", "-q1", "-q3")
would show
all except q1 and q3.
For further customization, you could redefine any preset in the
following manner: .st_env$descr.stats$common <- c("mean", "sd", "n")
.
Use caution when modifying .st_env
, and reload the package
if errors ensue. Changes are temporary and will not persist across
R sessions.
data("exams")
# All stats (default behavior) for all numerical variables
descr(exams)
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics
#> exams
#> N: 30
#>
#> economics english french geography history math
#> ----------------- ----------- --------- -------- ----------- --------- -------
#> Mean 73.91 75.96 73.94 70.04 72.77 73.54
#> Std.Dev 8.62 7.92 10.79 10.65 10.20 9.19
#> Min 60.50 58.30 44.80 47.20 53.90 55.60
#> Q1 68.80 70.90 68.20 65.90 68.20 66.95
#> Median 71.60 74.10 73.60 68.50 72.75 73.75
#> Q3 77.00 80.60 76.70 77.80 76.50 80.35
#> Max 94.20 93.10 94.70 96.30 93.50 93.20
#> MAD 5.49 6.52 7.56 12.31 6.45 9.93
#> IQR 8.20 9.70 8.50 11.90 8.15 13.35
#> CV 0.12 0.10 0.15 0.15 0.14 0.12
#> Skewness 0.75 0.28 0.03 0.10 0.01 0.12
#> SE.Skewness 0.43 0.43 0.43 0.43 0.43 0.44
#> Kurtosis -0.42 -0.25 0.45 -0.03 -0.60 -0.58
#> N.Valid 29.00 29.00 29.00 29.00 30.00 28.00
#> N 30.00 30.00 30.00 30.00 30.00 30.00
#> Pct.Valid 96.67 96.67 96.67 96.67 100.00 93.33
# Show only "common" statistics, plus "n"
descr(exams, stats = c("common", "n"))
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics
#> exams
#> N: 30
#>
#> economics english french geography history math
#> --------------- ----------- --------- -------- ----------- --------- -------
#> Mean 73.91 75.96 73.94 70.04 72.77 73.54
#> Std.Dev 8.62 7.92 10.79 10.65 10.20 9.19
#> Min 60.50 58.30 44.80 47.20 53.90 55.60
#> Median 71.60 74.10 73.60 68.50 72.75 73.75
#> Max 94.20 93.10 94.70 96.30 93.50 93.20
#> N.Valid 29.00 29.00 29.00 29.00 30.00 28.00
#> N 30.00 30.00 30.00 30.00 30.00 30.00
#> Pct.Valid 96.67 96.67 96.67 96.67 100.00 93.33
# Selection of statistics, transposing the results
descr(exams, stats = c("mean", "sd", "min", "max"), transpose = TRUE)
#> Non-numerical variable(s) ignored: student, gender
#> Descriptive Statistics
#> exams
#> N: 30
#>
#> Mean Std.Dev Min Max
#> --------------- ------- --------- ------- -------
#> economics 73.91 8.62 60.50 94.20
#> english 75.96 7.92 58.30 93.10
#> french 73.94 10.79 44.80 94.70
#> geography 70.04 10.65 47.20 96.30
#> history 72.77 10.20 53.90 93.50
#> math 73.54 9.19 55.60 93.20
# Rmarkdown-ready
descr(exams, plain.ascii = FALSE, style = "rmarkdown")
#> Non-numerical variable(s) ignored: student, gender
#> ### Descriptive Statistics
#> #### exams
#> **N:** 30
#>
#> | | economics | english | french | geography | history | math |
#> |----------------:|----------:|--------:|-------:|----------:|--------:|------:|
#> | **Mean** | 73.91 | 75.96 | 73.94 | 70.04 | 72.77 | 73.54 |
#> | **Std.Dev** | 8.62 | 7.92 | 10.79 | 10.65 | 10.20 | 9.19 |
#> | **Min** | 60.50 | 58.30 | 44.80 | 47.20 | 53.90 | 55.60 |
#> | **Q1** | 68.80 | 70.90 | 68.20 | 65.90 | 68.20 | 66.95 |
#> | **Median** | 71.60 | 74.10 | 73.60 | 68.50 | 72.75 | 73.75 |
#> | **Q3** | 77.00 | 80.60 | 76.70 | 77.80 | 76.50 | 80.35 |
#> | **Max** | 94.20 | 93.10 | 94.70 | 96.30 | 93.50 | 93.20 |
#> | **MAD** | 5.49 | 6.52 | 7.56 | 12.31 | 6.45 | 9.93 |
#> | **IQR** | 8.20 | 9.70 | 8.50 | 11.90 | 8.15 | 13.35 |
#> | **CV** | 0.12 | 0.10 | 0.15 | 0.15 | 0.14 | 0.12 |
#> | **Skewness** | 0.75 | 0.28 | 0.03 | 0.10 | 0.01 | 0.12 |
#> | **SE.Skewness** | 0.43 | 0.43 | 0.43 | 0.43 | 0.43 | 0.44 |
#> | **Kurtosis** | -0.42 | -0.25 | 0.45 | -0.03 | -0.60 | -0.58 |
#> | **N.Valid** | 29.00 | 29.00 | 29.00 | 29.00 | 30.00 | 28.00 |
#> | **N** | 30.00 | 30.00 | 30.00 | 30.00 | 30.00 | 30.00 |
#> | **Pct.Valid** | 96.67 | 96.67 | 96.67 | 96.67 | 100.00 | 93.33 |
# Grouped statistics
data("tobacco")
with(tobacco, stby(BMI, gender, descr, check.nas = FALSE))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> Descriptive Statistics
#> BMI by gender
#> Data Frame: tobacco
#> N: 978
#>
#> F M
#> ----------------- -------- --------
#> Mean 26.10 25.31
#> Std.Dev 4.95 3.98
#> Min 9.01 8.83
#> Q1 22.98 22.52
#> Median 25.87 25.14
#> Q3 29.48 27.96
#> Max 39.44 36.76
#> MAD 4.75 4.02
#> IQR 6.49 5.44
#> CV 0.19 0.16
#> Skewness -0.02 -0.04
#> SE.Skewness 0.11 0.11
#> Kurtosis 0.09 0.17
#> N.Valid 475.00 477.00
#> N 489.00 489.00
#> Pct.Valid 97.14 97.55
# Grouped statistics in tidy table:
tb(with(tobacco, stby(BMI, age.gr, descr, stats = "common")))
#> NA detected in grouping variable(s); consider using useNA = TRUE
#> # A tibble: 4 × 10
#> age.gr variable mean sd min med max n.valid n pct.valid
#> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18-34 BMI 23.8 4.23 8.83 24.0 34.8 252 258 97.7
#> 2 35-50 BMI 25.1 4.34 10.3 25.1 39.4 232 241 96.3
#> 3 51-70 BMI 26.9 4.26 9.01 26.8 39.2 312 317 98.4
#> 4 71 + BMI 27.4 4.37 16.4 27.5 38.4 155 159 97.5
if (FALSE) { # \dontrun{
# Show in Viewer (or browser if not in RStudio)
view(descr(exams))
# Save to html file with title
print(descr(exams),
file = "descr_exams.html",
report.title = "BMI by Age Group",
footnote = "<b>Schoolyear:</b> 2018-2019<br/><b>Semester:</b> Fall")
} # }