Cramer's V for chi-square goodness-of-fit tests

Calculates Cramer's V for a vector of counts and expected counts; confidence intervals by bootstrap.

Usage

cramerVFit(
  x,
  p = rep(1/length(x), length(x)),
  ci = FALSE,
  conf = 0.95,
  type = "perc",
  R = 1000,
  histogram = FALSE,
  digits = 4,
  reportIncomplete = FALSE,
  verbose = FALSE,
  ...
)

Arguments

x: A vector of observed counts.
p: A vector of expected or default probabilities.
ci: If TRUE, returns confidence intervals by bootstrap. May be slow.
conf: The level for the confidence interval.
type: The type of confidence interval to use. Can be any of "norm", "basic", "perc", or "bca". Passed to boot.ci.
R: The number of replications to use for bootstrap.
histogram: If TRUE, produces a histogram of bootstrapped values.
digits: The number of significant digits in the output.
reportIncomplete: If FALSE (the default), NA will be reported in cases where there are instances of the calculation of the statistic failing during the bootstrap procedure.
verbose: If TRUE, prints additional statistics.
...: Additional arguments passed to chisq.test.

Value

A single statistic, Cramer's V. Or a small data frame consisting of Cramer's V, and the lower and upper confidence limits.

Details

This modification of Cramer's V could be used to indicate an effect size in cases where a chi-square goodness-of-fit test might be used. It indicates the degree of deviation of observed counts from the expected probabilities.

In the case of equally-distributed expected frequencies, Cramer's V will be equal to 1 when all counts are in one category, and it will be equal to 0 when the counts are equally distributed across categories. This does not hold if the expected frequencies are not equally-distributed.

Because V is always positive, if type="perc", the confidence interval will never cross zero, and should not be used for statistical inference. However, if type="norm", the confidence interval may cross zero.

When V is close to 0 or 1, or with small counts, the confidence intervals determined by this method may not be reliable, or the procedure may fail.

In addition, the function will not return a confidence interval if there are zeros in any cell.

References

https://rcompanion.org/handbook/H_03.html

Author

Salvatore Mangiafico, mangiafico@njaes.rutgers.edu

Examples

### Equal probabilities example
### From https://rcompanion.org/handbook/H_03.html
nail.color = c("Red", "None", "White", "Green", "Purple", "Blue")
observed   = c( 19,    3,      1,       1,       2,        2    )
expected   = c( 1/6,   1/6,    1/6,     1/6,     1/6,      1/6  )
chisq.test(x = observed, p = expected)
#> Warning: Chi-squared approximation may be incorrect
#> 
#> 	Chi-squared test for given probabilities
#> 
#> data:  observed
#> X-squared = 53.429, df = 5, p-value = 2.746e-10
#> 
cramerVFit(x = observed, p = expected)
#> Cramer V 
#>   0.6178 

### Unequal probabilities example
### From https://rcompanion.org/handbook/H_03.html
race = c("White", "Black", "American Indian", "Asian", "Pacific Islander",
          "Two or more races")
observed = c(20, 9, 9, 1, 1, 1)
expected = c(0.775, 0.132, 0.012, 0.054, 0.002, 0.025)
chisq.test(x = observed, p = expected)
#> Warning: Chi-squared approximation may be incorrect
#> 
#> 	Chi-squared test for given probabilities
#> 
#> data:  observed
#> X-squared = 164.81, df = 5, p-value < 2.2e-16
#> 
cramerVFit(x = observed, p = expected)
#> Cramer V 
#>   0.8966 

### Examples of perfect and zero fits
cramerVFit(c(100, 0, 0, 0, 0))
#> Cramer V 
#>        1 
cramerVFit(c(10, 10, 10, 10, 10))
#> Cramer V 
#>        0