Report basic summary statistics by a grouping variable. Useful if the grouping variable is some experimental variable and data are to be aggregated for plotting. Partly a wrapper for by and describe

describeBy(x, group=NULL,mat=FALSE,type=3,digits=15,data,...)
describe.by(x, group=NULL,mat=FALSE,type=3,...)  # deprecated

Arguments

x

a data.frame or matrix. See note for statsBy.

group

a grouping variable or a list of grouping variables. (may be ignored if calling using the formula mode.)

mat

provide a matrix output rather than a list

type

Which type of skew and kurtosis should be found

digits

When giving matrix output, how many digits should be reported?

data

Needed if using formula input

...

parameters to be passed to describe

Details

To get descriptive statistics for several different grouping variables, make sure that group is a list. In the case of matrix output with multiple grouping variables, the grouping variable values are added to the output.

As of July, 2020, the grouping variable(s) may be specified in formula mode (see the examples).

The type parameter specifies which version of skew and kurtosis should be found. See describe for more details.

An alternative function (statsBy) returns a list of means, n, and standard deviations for each group. This is particularly useful if finding weighted correlations of group means using cor.wt. More importantly, it does a proper within and between group decomposition of the correlation.

cohen.d will work for two groups. It converts the data into mean differences and pools the within group standard deviations. Returns cohen.d statistic as well as the multivariate generalization (Mahalanobis D).

Value

A data.frame of the relevant statistics broken down by group:
item name
item number
number of valid cases
mean
standard deviation
median
mad: median absolute deviation (from the median)
minimum
maximum
skew
standard error

Author

William Revelle

See also

describe, statsBy, densityBy and violinBy, cohen.d, cohen.d.by, and cohen.d.ci as well as error.bars and error.bars.by for other graphical displays.

Examples


data(sat.act)
describeBy(sat.act,sat.act$gender) #just one grouping variable
#> 
#>  Descriptive statistics by group 
#> group: 1
#>           vars   n   mean     sd median trimmed    mad min max range  skew
#> gender       1 247   1.00   0.00      1    1.00   0.00   1   1     0   NaN
#> education    2 247   3.00   1.54      3    3.12   1.48   0   5     5 -0.54
#> age          3 247  25.86   9.74     22   24.23   5.93  14  58    44  1.43
#> ACT          4 247  28.79   5.06     30   29.23   4.45   3  36    33 -1.06
#> SATV         5 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63
#> SATQ         6 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72
#>           kurtosis   se
#> gender         NaN 0.00
#> education    -0.60 0.10
#> age           1.43 0.62
#> ACT           1.89 0.32
#> SATV          0.13 7.26
#> SATQ         -0.12 7.41
#> ------------------------------------------------------------ 
#> group: 2
#>           vars   n   mean     sd median trimmed    mad min max range  skew
#> gender       1 453   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> education    2 453   3.26   1.35      3    3.40   1.48   0   5     5 -0.74
#> age          3 453  25.45   9.37     22   23.70   5.93  13  65    52  1.77
#> ACT          4 453  28.42   4.69     29   28.63   4.45  15  36    21 -0.39
#> SATV         5 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65
#> SATQ         6 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58
#>           kurtosis   se
#> gender         NaN 0.00
#> education     0.27 0.06
#> age           3.03 0.44
#> ACT          -0.42 0.22
#> SATV          0.42 5.28
#> SATQ          0.13 5.38
describeBy(sat.act ~ gender)   #describe the entire set  formula input
#> 
#>  Descriptive statistics by group 
#> gender: 1
#>           vars   n   mean     sd median trimmed    mad min max range  skew
#> gender       1 247   1.00   0.00      1    1.00   0.00   1   1     0   NaN
#> education    2 247   3.00   1.54      3    3.12   1.48   0   5     5 -0.54
#> age          3 247  25.86   9.74     22   24.23   5.93  14  58    44  1.43
#> ACT          4 247  28.79   5.06     30   29.23   4.45   3  36    33 -1.06
#> SATV         5 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63
#> SATQ         6 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72
#>           kurtosis   se
#> gender         NaN 0.00
#> education    -0.60 0.10
#> age           1.43 0.62
#> ACT           1.89 0.32
#> SATV          0.13 7.26
#> SATQ         -0.12 7.41
#> ------------------------------------------------------------ 
#> gender: 2
#>           vars   n   mean     sd median trimmed    mad min max range  skew
#> gender       1 453   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> education    2 453   3.26   1.35      3    3.40   1.48   0   5     5 -0.74
#> age          3 453  25.45   9.37     22   23.70   5.93  13  65    52  1.77
#> ACT          4 453  28.42   4.69     29   28.63   4.45  15  36    21 -0.39
#> SATV         5 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65
#> SATQ         6 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58
#>           kurtosis   se
#> gender         NaN 0.00
#> education     0.27 0.06
#> age           3.03 0.44
#> ACT          -0.42 0.22
#> SATV          0.42 5.28
#> SATQ          0.13 5.38
describeBy(SATV + SATQ ~ gender,data =sat.act)  #specify the data set if using formula
#> 
#>  Descriptive statistics by group 
#> gender: 1
#>      vars   n   mean     sd median trimmed    mad min max range  skew kurtosis
#> SATV    1 247 615.11 114.16    630  622.07 118.61 200 800   600 -0.63     0.13
#> SATQ    2 245 635.87 116.02    660  645.53  94.89 300 800   500 -0.72    -0.12
#>        se
#> SATV 7.26
#> SATQ 7.41
#> ------------------------------------------------------------ 
#> gender: 2
#>      vars   n   mean     sd median trimmed    mad min max range  skew kurtosis
#> SATV    1 453 610.66 112.31    620  617.91 103.78 200 800   600 -0.65     0.42
#> SATQ    2 442 596.00 113.07    600  602.21 133.43 200 800   600 -0.58     0.13
#>        se
#> SATV 5.28
#> SATQ 5.38
#describeBy(sat.act,list(sat.act$gender,sat.act$education))  #two grouping variables
describeBy(sat.act ~ gender +  education) #two grouping variables
#> 
#>  Descriptive statistics by group 
#> gender: 1
#> education: 0
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 27   1.00   0.00      1    1.00   0.00   1   1     0   NaN
#> education    2 27   0.00   0.00      0    0.00   0.00   0   0     0   NaN
#> age          3 27  16.93   1.04     17   17.04   1.48  14  18     4 -0.86
#> ACT          4 27  29.04   5.00     29   29.22   5.93  20  36    16 -0.30
#> SATV         5 27 640.07 132.24    670  646.17 177.91 400 800   400 -0.29
#> SATQ         6 27 642.67 127.90    660  647.91 177.91 400 800   400 -0.24
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           0.34  0.20
#> ACT          -1.13  0.96
#> SATV         -1.40 25.45
#> SATQ         -1.36 24.61
#> ------------------------------------------------------------ 
#> gender: 2
#> education: 0
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 30   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> education    2 30   0.00   0.00      0    0.00   0.00   0   0     0   NaN
#> age          3 30  16.97   1.07     17   17.12   0.74  13  18     5 -1.75
#> ACT          4 30  26.07   5.06     26   25.92   5.93  15  36    21  0.08
#> SATV         5 30 595.30 123.46    595  597.08 148.26 350 800   450 -0.09
#> SATQ         6 29 599.72 123.20    600  600.96 148.26 333 800   467 -0.09
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           4.13  0.19
#> ACT          -0.56  0.92
#> SATV         -0.81 22.54
#> SATQ         -0.99 22.88
#> ------------------------------------------------------------ 
#> gender: 1
#> education: 1
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 20   1.00   0.00      1    1.00   0.00   1   1     0   NaN
#> education    2 20   1.00   0.00      1    1.00   0.00   1   1     0   NaN
#> age          3 20  19.65   6.12     18   18.19   0.00  17  45    28  3.55
#> ACT          4 20  26.70   7.11     28   27.12   8.15  15  35    20 -0.30
#> SATV         5 20 603.00 141.24    600  611.25 185.32 300 780   480 -0.39
#> SATQ         6 19 625.84  95.87    650  630.94  88.96 400 765   365 -0.66
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age          11.78  1.37
#> ACT          -1.51  1.59
#> SATV         -1.12 31.58
#> SATQ         -0.47 21.99
#> ------------------------------------------------------------ 
#> gender: 2
#> education: 1
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 25   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> education    2 25   1.00   0.00      1    1.00   0.00   1   1     0   NaN
#> age          3 25  19.32   4.62     18   18.14   0.00  17  37    20  2.86
#> ACT          4 25  28.12   5.13     27   28.33   4.45  18  36    18 -0.21
#> SATV         5 25 597.00 119.38    610  600.76 133.43 350 799   449 -0.31
#> SATQ         6 24 592.54 140.83    625  606.60 111.19 230 799   569 -0.93
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           7.27  0.92
#> ACT          -0.78  1.03
#> SATV         -0.95 23.88
#> SATQ          0.20 28.75
#> ------------------------------------------------------------ 
#> gender: 1
#> education: 2
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 23   1.00   0.00      1    1.00   0.00   1   1     0   NaN
#> education    2 23   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> age          3 23  25.26   8.68     22   23.58   4.45  18  55    37  1.94
#> ACT          4 23  26.65   6.39     28   27.68   4.45   3  32    29 -2.14
#> SATV         5 23 560.00 152.29    600  570.53 148.26 200 800   600 -0.53
#> SATQ         6 23 569.13 160.65    600  575.79 177.91 300 800   500 -0.36
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           3.63  1.81
#> ACT           5.39  1.33
#> SATV         -0.59 31.75
#> SATQ         -1.44 33.50
#> ------------------------------------------------------------ 
#> gender: 2
#> education: 2
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 21   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> education    2 21   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> age          3 21  30.10  12.22     26   28.41  10.38  18  57    39  1.16
#> ACT          4 21  27.33   5.23     28   27.53   4.45  15  36    21 -0.32
#> SATV         5 21 593.57 115.34    600  598.24 118.61 375 770   395 -0.44
#> SATQ         6 20 586.50 120.96    585  587.81 163.09 375 800   425  0.01
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           0.01  2.67
#> ACT          -0.34  1.14
#> SATV         -0.91 25.17
#> SATQ         -1.11 27.05
#> ------------------------------------------------------------ 
#> gender: 1
#> education: 3
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 80   1.00   0.00      1    1.00   0.00   1   1     0   NaN
#> education    2 80   3.00   0.00      3    3.00   0.00   3   3     0   NaN
#> age          3 80  20.81   3.06     20   20.28   1.48  17  34    17  2.00
#> ACT          4 80  28.56   5.03     30   28.84   5.19  17  36    19 -0.45
#> SATV         5 80 617.44 111.79    630  624.45 111.19 300 800   500 -0.62
#> SATQ         6 79 642.59 118.28    680  653.15 118.61 300 800   500 -0.81
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           4.55  0.34
#> ACT          -0.92  0.56
#> SATV         -0.06 12.50
#> SATQ         -0.17 13.31
#> ------------------------------------------------------------ 
#> gender: 2
#> education: 3
#>           vars   n   mean     sd median trimmed    mad min max range  skew
#> gender       1 195   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> education    2 195   3.00   0.00      3    3.00   0.00   3   3     0   NaN
#> age          3 195  21.09   4.75     20   20.04   1.48  17  46    29  3.41
#> ACT          4 195  28.18   4.78     29   28.43   4.45  16  36    20 -0.46
#> SATV         5 195 609.96 119.78    620  619.57 118.61 200 800   600 -0.81
#> SATQ         6 190 590.89 114.46    600  598.94 118.61 200 800   600 -0.72
#>           kurtosis   se
#> gender         NaN 0.00
#> education      NaN 0.00
#> age          12.83 0.34
#> ACT          -0.47 0.34
#> SATV          0.66 8.58
#> SATQ          0.38 8.30
#> ------------------------------------------------------------ 
#> gender: 1
#> education: 4
#>           vars  n   mean     sd median trimmed   mad min max range  skew
#> gender       1 51   1.00   0.00      1    1.00  0.00   1   1     0   NaN
#> education    2 51   4.00   0.00      4    4.00  0.00   4   4     0   NaN
#> age          3 51  32.22   9.03     29   30.78  8.90  23  57    34  1.20
#> ACT          4 51  28.94   4.42     29   29.34  4.45  16  36    20 -0.74
#> SATV         5 51 620.31  81.72    620  623.32 88.96 430 800   370 -0.26
#> SATQ         6 51 635.90 104.12    640  642.46 88.96 400 800   400 -0.46
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           0.63  1.27
#> ACT           0.12  0.62
#> SATV         -0.29 11.44
#> SATQ         -0.45 14.58
#> ------------------------------------------------------------ 
#> gender: 2
#> education: 4
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 87   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> education    2 87   4.00   0.00      4    4.00   0.00   4   4     0   NaN
#> age          3 87  29.08   7.76     26   27.83   5.93  21  52    31  1.26
#> ACT          4 87  29.45   4.32     30   29.59   4.45  19  36    17 -0.27
#> SATV         5 87 614.98 106.62    620  621.39  88.96 300 800   500 -0.58
#> SATQ         6 86 597.59 106.24    600  605.76 118.61 300 800   500 -0.71
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           0.70  0.83
#> ACT          -0.67  0.46
#> SATV          0.28 11.43
#> SATQ          0.20 11.46
#> ------------------------------------------------------------ 
#> gender: 1
#> education: 5
#>           vars  n   mean    sd median trimmed    mad min max range  skew
#> gender       1 46   1.00  0.00    1.0    1.00   0.00   1   1     0   NaN
#> education    2 46   5.00  0.00    5.0    5.00   0.00   5   5     0   NaN
#> age          3 46  35.85 10.00   35.5   35.13  11.12  22  58    36  0.47
#> ACT          4 46  30.83  3.11   32.0   30.95   2.97  25  36    11 -0.38
#> SATV         5 46 623.48 99.58  645.0  631.18  96.37 390 770   380 -0.61
#> SATQ         6 46 657.83 89.61  680.0  661.71 103.78 475 800   325 -0.45
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age          -0.67  1.48
#> ACT          -0.81  0.46
#> SATV         -0.43 14.68
#> SATQ         -0.77 13.21
#> ------------------------------------------------------------ 
#> gender: 2
#> education: 5
#>           vars  n   mean     sd median trimmed    mad min max range  skew
#> gender       1 95   2.00   0.00      2    2.00   0.00   2   2     0   NaN
#> education    2 95   5.00   0.00      5    5.00   0.00   5   5     0   NaN
#> age          3 95  34.34  10.67     30   32.74   8.90  22  65    43  1.18
#> ACT          4 95  29.01   4.19     29   29.14   4.45  18  36    18 -0.31
#> SATV         5 95 620.39  95.72    620  623.61  74.13 300 800   500 -0.46
#> SATQ         6 93 606.72 105.55    600  608.93 148.26 350 800   450 -0.14
#>           kurtosis    se
#> gender         NaN  0.00
#> education      NaN  0.00
#> age           0.61  1.09
#> ACT          -0.73  0.43
#> SATV          0.43  9.82
#> SATQ         -0.94 10.95
des.mat <- describeBy(age ~ education,mat=TRUE,data = sat.act) #matrix (data.frame) output 
des.mat <- describeBy(age ~ education + gender, data=sat.act,
               mat=TRUE,digits=2)  #matrix output  rounded to 2 decimals