Correlation and measures of association

Produces measures of association for all variables in a data frame with confidence intervals when available.

Usage

correlation(
  data = NULL,
  printClasses = FALSE,
  progress = TRUE,
  methodNum = "pearson",
  methodOrd = "kendall",
  methodNumOrd = "spearman",
  methodNumNom = "eta",
  methodNumBin = "pearson",
  testChisq = "chisq",
  ci = FALSE,
  conf = 0.95,
  R = 1000,
  correct = FALSE,
  reportIncomplete = TRUE,
  na.action = "na.omit",
  digits = 3,
  pDigits = 4,
  ...
)

Arguments

data

A data frame.

printClasses

If TRUE, prints a table of classes for all variables.

progress

If TRUE, prints progress bar when bootstrap methods are called.

methodNum

The method for the correlation for two numeric variables. The default is "pearson". Other options are "spearman" and "kendall".

methodOrd

The method for the correlation for two ordinal variables. The default is "kendall", with Kendall's tau-c used. Other option is "spearman".

methodNumOrd

The method for the correlation of a numeric and an ordinal variable. The default is "pearson". Other options are "spearman" and "kendall".

methodNumNom

The method for the correlation of a numeric and a nominal variable.

The default is "eta", which is the square root of the r-squared value from anova. The other option is "epsilon", which is the same, except with the numeric value rank-transformed.

methodNumBin

The method for the correlation of a numeric and a binary variable. The default is "pearson". The other option is "glass", which uses the Glass rank biserial correlation.

testChisq

The method for the test of two nominal variables. The default is "chisq". The other option is "fisher".

ci

If TRUE, calculates confidence intervals for methods requiring bootstrap. If FALSE, will return only those confidence intervals from methods not requiring bootstrap.

conf

The confidence level for confidence intervals.

R

The number of replications to use for bootstrap confidence intervals for applicable methods.

correct

Passed to chisq.test.

reportIncomplete

If FALSE, NA will be reported in cases where there are instances of the calculation of the statistic failing during the bootstrap procedure.

na.action

If "na.omit", the function will use only complete cases, assessed on a bivariate basis. The other option is "na.pass".

digits

The number of decimal places in the output of most statistics.

pDigits

The number of decimal places in the output for p-values.

...

Other arguments.

Value

A data frame of variables, association statistics, p-values, and confidence intervals.

Details

It’s important that variables are assigned the correct class to get an appropriate measure of association. That is, factor variables should be of class "factor", not "character". Ordered factors should be ordered factors (and have their levels in the correct order!).

Date variables are treated as numeric.

The default for measures of association tend to be "parametric" type. That is, e.g. Pearson correlation where appropriate.

Nonparametric measures of association will be reported with the options methodNum = "spearman", methodNumNom = "epsilon", methodNumBin = "glass", methodNumOrd="spearman".

References

https://rcompanion.org/handbook/I_14.html

Author

Salvatore Mangiafico, mangiafico@njaes.rutgers.edu

Examples


Length   = c(0.29, 0.25, NA, 0.40, 0.50, 0.57, 0.62, 0.88, 0.99, 0.90)
Rating   = factor(ordered=TRUE, levels=c("Low", "Medium", "High"),
                  x = rep(c("Low", "Medium", "High"), c(3,3,4)))
Color    = factor(rep(c("Red", "Green", "Blue"), c(4,4,2)))
Flag     = factor(rep(c(TRUE, FALSE, TRUE), c(5,4,1)))
Answer   = factor(rep(c("Yes", "No", "Yes"), c(4,3,3)), levels=c("Yes", "No"))
Location = factor(rep(c("Home", "Away", "Other"), c(2,4,4)))
Distance = factor(ordered=TRUE, levels=c("Low", "Medium", "High"),
                  x = rep(c("Low", "Medium", "High"), c(5,2,3))) 
Start    = seq(as.Date("2024-01-01"), by = "month", length.out = 10)
Data = data.frame(Length, Rating, Color, Flag, Answer, Location, Distance, Start)  
correlation(Data)
#>        Var1     Var2              Type  N             Measure Statistic
#> 1    Length   Rating Numeric x Ordinal  9            Spearman     0.935
#> 2    Length    Color Numeric x Nominal  9                 Eta     0.913
#> 3    Length     Flag  Numeric x Binary  9             Pearson    -0.576
#> 4    Length   Answer  Numeric x Binary  9             Pearson    -0.101
#> 5    Length Location Numeric x Nominal  9                 Eta     0.919
#> 6    Length Distance Numeric x Ordinal  9            Spearman     0.935
#> 7    Length    Start Numeric x Numeric  9             Pearson     0.959
#> 8    Rating    Color Ordinal x Nominal 10             Freeman     0.812
#> 9    Rating     Flag  Ordinal x Binary 10 Glass rank biserial    -0.333
#> 10   Rating   Answer  Ordinal x Binary 10 Glass rank biserial     0.667
#> 11   Rating Location Ordinal x Nominal 10             Freeman     0.938
#> 12   Rating Distance Ordinal x Ordinal 10             Kendall     0.780
#> 13   Rating    Start Ordinal x Numeric 10            Spearman     0.944
#> 14    Color     Flag  Nominal x Binary 10              Cramer     0.692
#> 15    Color   Answer  Nominal x Binary 10              Cramer     0.802
#> 16    Color Location Nominal x Nominal 10              Cramer     0.612
#> 17    Color Distance Nominal x Ordinal 10             Freeman     0.812
#> 18    Color    Start Nominal x Numeric 10                 Eta     0.935
#> 19     Flag   Answer   Binary x Binary 10                 Phi    -0.356
#> 20     Flag Location  Binary x Nominal 10              Cramer     0.612
#> 21     Flag Distance  Binary x Ordinal 10 Glass rank biserial    -0.750
#> 22     Flag    Start  Binary x Numeric 10             Pearson    -0.569
#> 23   Answer Location  Binary x Nominal 10              Cramer     0.408
#> 24   Answer Distance  Binary x Ordinal 10 Glass rank biserial    -0.048
#> 25   Answer    Start  Binary x Numeric 10             Pearson     0.111
#> 26 Location Distance Nominal x Ordinal 10             Freeman     0.781
#> 27 Location    Start Nominal x Numeric 10                 Eta     0.933
#> 28 Distance    Start Ordinal x Numeric 10            Spearman     0.921
#>    Lower.CL Upper.CL             Test p.value Signif
#> 1     0.716    0.987         cor.test  0.0002    ***
#> 2     0.812    1.000            Anova  0.0047     **
#> 3    -0.897    0.142         cor.test  0.1044   n.s.
#> 4    -0.717    0.603         cor.test  0.7955   n.s.
#> 5     0.827    1.000            Anova  0.0037     **
#> 6     0.716    0.987         cor.test  0.0002    ***
#> 7     0.812    0.992         cor.test  0.0000   ****
#> 8        NA       NA Cochran-Armitage  0.0239      *
#> 9        NA       NA      wilcox.test  0.0708   n.s.
#> 10       NA       NA      wilcox.test  0.7172   n.s.
#> 11       NA       NA Cochran-Armitage  0.0116      *
#> 12    0.641    0.919 Linear by linear  0.0102      *
#> 13    0.775    0.987         cor.test  0.0000   ****
#> 14       NA       NA       chisq.test  0.0911   n.s.
#> 15       NA       NA       chisq.test  0.0402      *
#> 16       NA       NA       chisq.test  0.1117   n.s.
#> 17       NA       NA Cochran-Armitage  0.0251      *
#> 18    0.885    0.982            Anova  0.0007    ***
#> 19       NA       NA       chisq.test  0.2598   n.s.
#> 20       NA       NA       chisq.test  0.1534   n.s.
#> 21       NA       NA      wilcox.test  0.0491      *
#> 22   -0.882    0.095         cor.test  0.0862   n.s.
#> 23       NA       NA       chisq.test  0.4346   n.s.
#> 24       NA       NA      wilcox.test  1.0000   n.s.
#> 25   -0.557    0.692         cor.test  0.7597   n.s.
#> 26       NA       NA Cochran-Armitage  0.0181      *
#> 27    0.883    0.981            Anova  0.0008    ***
#> 28    0.694    0.982         cor.test  0.0002    ***