Interval statistics

Calculate coverage intervals and confidence intervals for the sample mean, median, sd, proportion, ... Typically, these will be used within df_stats(). For the mean, median, and sd, the variable x must be quantitative. For proportions, the x can be anything; use the success argument to specify what value you want the proportion of. Default for success is TRUE for x logical, or the first level returned by unique for categorical or numerical variables.

coverage(x, level = 0.95, na.rm = TRUE)

ci.mean(x, level = 0.95, na.rm = TRUE)

ci.median(x, level = 0.9, na.rm = TRUE)

ci.sd(x, level = 0.95, na.rm = TRUE)

ci.prop(
  x,
  success = NULL,
  level = 0.95,
  method = c("Clopper-Pearson", "binom.test", "Score", "Wilson", "prop.test", "Wald",
    "Agresti-Coull", "Plus4")
)

Arguments

x: a variable.
level: number in 0 to 1 specifying the confidence level for the interval. (Default: 0.95)
na.rm: if TRUE disregard missing data
success: for proportions, this specifies the categorical level for which the calculation of proportion will be done. Defaults: TRUE for logicals for which the proportion is to be calculated.
method: for ci.prop(), the method to use in calculating the confidence interval. See mosaic::binom.test() for details.

Value

a named numerical vector with components lower and upper, and, in the case of ci.prop(), center. When used the df_stats(), these components are formed into a data frame.

Details

Methods: ci.mean() uses the standard t confidence interval. ci.median() uses the normal approximation method. ci.sd() uses the chi-squared method. ci.prop() uses the binomial method. In the usual situation where the mosaic package is available, ci.prop() uses mosaic::binom.test() internally, which provides several methods for the calculation. See the documentation for binom.test() for details about the available methods. Clopper-Pearson is the default method. When used with df_stats(), the confidence interval is calculated for each group separately. For "pooled" confidence intervals, see methods such as lm() or glm().

Note

When using these functions with df_stats(), omit the x argument, which will be supplied automatically by df_stats(). See examples.

Examples

# The central 95% interval
df_stats(hp ~ cyl, data = mtcars, c95 = coverage(0.95))
#>   response cyl c95_lower c95_upper
#> 1       hp   4     54.50   112.000
#> 2       hp   6    105.75   167.200
#> 3       hp   8    150.00   311.925
# The confidence interval on the mean
df_stats(hp ~ cyl, data = mtcars, mean, ci.mean)
#>   response cyl      mean     lower     upper
#> 1       hp   4  82.63636  68.57236  96.70037
#> 2       hp   6 122.28571  99.84850 144.72293
#> 3       hp   8 209.21429 179.78111 238.64746
# What fraction of cars have 6 cylinders?
df_stats(mtcars, ~ cyl, six_cyl_prop = ci.prop(success = 6, level = 0.90))
#>   response six_cyl_prop_lower six_cyl_prop_center six_cyl_prop_upper
#> 1      cyl          0.1074469             0.21875          0.3718991
# Use without `df_stats()` (rare)
ci.mean(mtcars$hp)
#>    lower    upper 
#> 121.9679 171.4071

Arguments

Value

Details

Note

See also

Examples