Pseudo r-squared measures for various models

Produces McFadden, Cox and Snell, and Nagelkerke pseudo r-squared measures, along with p-values, for models.

Usage

nagelkerke(fit, null = NULL, restrictNobs = FALSE)

Arguments

fit: The fitted model object for which to determine pseudo r-squared.
null: The null model object against which to compare the fitted model object. The null model must be nested in the fitted model to be valid. Specifying the null is optional for some model object types and is required for others.
restrictNobs: If TRUE, limits the observations for the null model to those used in the fitted model. Works with only some model object types.

Value

A list of six objects describing the models used, the pseudo r-squared values, the likelihood ratio test for the model, the number of observations for the models, messages, and any warnings.

Details

Pseudo R-squared values are not directly comparable to the R-squared for OLS models. Nor can they be interpreted as the proportion of the variability in the dependent variable that is explained by model. Instead pseudo R-squared measures are relative measures among similar models indicating how well the model explains the data.

Cox and Snell is also referred to as ML. Nagelkerke is also referred to as Cragg and Uhler.

Model objects accepted are lm, glm, gls, lme, lmer, lmerTest, nls, clm, clmm, vglm, glmer, glmmTMB, negbin, zeroinfl, betareg, and rq.

Model objects that require the null model to be defined are nls, lmer, glmer, and clmm. Other objects use the update function to define the null model.

Likelihoods are found using ML (REML = FALSE).

The fitted model and the null model should be properly nested. That is, the terms of one need to be a subset of the the other, and they should have the same set of observations. One issue arises when there are NA values in one variable but not another, and observations with NA are removed in the model fitting. The result may be fitted and null models with different sets of observations. Setting restrictNobs to TRUE ensures that only observations in the fit model are used in the null model. This appears to work for lm and some glm models, but causes the function to fail for other model object types.

Some pseudo R-squared measures may not be appropriate or useful for some model types.

Calculations are based on log likelihood values for models. Results may be different than those based on deviance.

Acknowledgments

My thanks to Jan-Herman Kuiper of Keele University for suggesting the restrictNobs fix.

References

https://rcompanion.org/handbook/G_10.html

Author

Salvatore Mangiafico, mangiafico@njaes.rutgers.edu

Examples

### Logistic regression example
data(AndersonBias)
model = glm(Result ~ County + Gender + County:Gender,
           weight = Count,
           data = AndersonBias,
           family = binomial(link="logit"))
nagelkerke(model)
#> $Models
#>                                                                                                        
#> Model: "glm, Result ~ County + Gender + County:Gender, binomial(link = \"logit\"), AndersonBias, Count"
#> Null:  "glm, Result ~ 1, binomial(link = \"logit\"), AndersonBias, Count"                              
#> 
#> $Pseudo.R.squared.for.model.vs.null
#>                              Pseudo.R.squared
#> McFadden                            0.0797857
#> Cox and Snell (ML)                  0.7136520
#> Nagelkerke (Cragg and Uhler)        0.7136520
#> 
#> $Likelihood.ratio.test
#>  Df.diff LogLik.diff  Chisq   p.value
#>       -7     -10.004 20.009 0.0055508
#> 
#> $Number.of.observations
#>          
#> Model: 16
#> Null:  16
#> 
#> $Messages
#> [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
#> 
#> $Warnings
#> [1] "None"
#> 

### Quadratic plateau example 
### With nls, the  null needs to be defined
data(BrendonSmall)
quadplat = function(x, a, b, clx) {
          ifelse(x  < clx, a + b * x   + (-0.5*b/clx) * x   * x,
                           a + b * clx + (-0.5*b/clx) * clx * clx)}
model = nls(Sodium ~ quadplat(Calories, a, b, clx),
            data = BrendonSmall,
            start = list(a   = 519,
                         b   = 0.359,
                         clx = 2304))
nullfunct = function(x, m){m}
null.model = nls(Sodium ~ nullfunct(Calories, m),
             data = BrendonSmall,
             start = list(m   = 1346))
nagelkerke(model, null=null.model)
#> $Models
#>                                                                                                                                                                                
#> Model: "nls, Sodium ~ quadplat(Calories, a, b, clx), BrendonSmall, list(a = 519, b = 0.359, clx = 2304), default, list(50, 1e-05, 0.0009765625, FALSE, FALSE, 0, FALSE), FALSE"
#> Null:  "nls, Sodium ~ nullfunct(Calories, m), BrendonSmall, list(m = 1346), default, list(50, 1e-05, 0.0009765625, FALSE, FALSE, 0, FALSE), FALSE"                             
#> 
#> $Pseudo.R.squared.for.model.vs.null
#>                              Pseudo.R.squared
#> McFadden                             0.175609
#> Cox and Snell (ML)                   0.864674
#> Nagelkerke (Cragg and Uhler)         0.864683
#> 
#> $Likelihood.ratio.test
#>  Df.diff LogLik.diff  Chisq    p.value
#>       -2     -45.001 90.003 2.8583e-20
#> 
#> $Number.of.observations
#>          
#> Model: 45
#> Null:  45
#> 
#> $Messages
#> [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
#> 
#> $Warnings
#> [1] "None"
#>