Compute Summary Statistics on a Vector

A number of statistical summary functions is provided for use with summary.formula and summarize (as well as tapply and by themselves). smean.cl.normal computes 3 summary variables: the sample mean and lower and upper Gaussian confidence limits based on the t-distribution. smean.sd computes the mean and standard deviation. smean.sdl computes the mean plus or minus a constant times the standard deviation. smean.cl.boot is a very fast implementation of the basic nonparametric bootstrap for obtaining confidence limits for the population mean without assuming normality. These functions all delete NAs automatically. smedian.hilow computes the sample median and a selected pair of outer quantiles having equal tail areas.

smean.cl.normal(x, mult=qt((1+conf.int)/2,n-1), conf.int=.95, na.rm=TRUE)

smean.sd(x, na.rm=TRUE)

smean.sdl(x, mult=2, na.rm=TRUE)

smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)

smedian.hilow(x, conf.int=.95, na.rm=TRUE)

Arguments

x: for summary functions smean.*, smedian.hilow, a numeric vector from which NAs will be removed automatically
na.rm: defaults to TRUE unlike built-in functions, so that by default NAs are automatically removed
mult: for smean.cl.normal is the multiplier of the standard error of the mean to use in obtaining confidence limits of the population mean (default is appropriate quantile of the t distribution). For smean.sdl, mult is the multiplier of the standard deviation used in obtaining a coverage interval about the sample mean. The default is mult=2 to use plus or minus 2 standard deviations.
conf.int: for smean.cl.normal and smean.cl.boot specifies the confidence level (0-1) for interval estimation of the population mean. For smedian.hilow, conf.int is the coverage probability the outer quantiles should target. When the default, 0.95, is used, the lower and upper quantiles computed are 0.025 and 0.975.
B: number of bootstrap resamples for smean.cl.boot
reps: set to TRUE to have smean.cl.boot return the vector of bootstrapped means as the reps attribute of the returned object

Value

a vector of summary statistics

Author

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

Examples

set.seed(1)
x <- rnorm(100)
smean.sd(x)
#>  Mean    SD 
#> 0.109 0.898 
smean.sdl(x)
#>   Mean  Lower  Upper 
#>  0.109 -1.688  1.905 
smean.cl.normal(x)
#>    Mean   Lower   Upper 
#>  0.1089 -0.0693  0.2871 
smean.cl.boot(x)
#>    Mean   Lower   Upper 
#>  0.1089 -0.0582  0.2770 
smedian.hilow(x, conf.int=.5)  # 25th and 75th percentiles
#> Median  Lower  Upper 
#>  0.114 -0.494  0.692 

# Function to compute 0.95 confidence interval for the difference in two means
# g is grouping variable
bootdif <- function(y, g) {
 g <- as.factor(g)
 a <- attr(smean.cl.boot(y[g==levels(g)[1]], B=2000, reps=TRUE),'reps')
 b <- attr(smean.cl.boot(y[g==levels(g)[2]], B=2000, reps=TRUE),'reps')
 meandif <- diff(tapply(y, g, mean, na.rm=TRUE))
 a.b <- quantile(b-a, c(.025,.975))
 res <- c(meandif, a.b)
 names(res) <- c('Mean Difference','.025','.975')
 res
}

Compute Summary Statistics on a Vector

Arguments

Value

Author

See also

Examples