Validation of a Quantile Regression Model

The validate function when used on an object created by Rq does resampling validation of a quantile regression model, with or without backward step-down variable deletion. Uses resampling to estimate the optimism in various measures of predictive accuracy which include mean absolute prediction error (MAD), Spearman rho, the \(g\)-index, and the intercept and slope of an overall calibration \(a + b\hat{y}\). The "corrected" slope can be thought of as shrinkage factor that takes into account overfitting. validate.Rq can also be used when a model for a continuous response is going to be applied to a binary response. A Somers' \(D_{xy}\) for this case is computed for each resample by dichotomizing y. This can be used to obtain an ordinary receiver operating characteristic curve area using the formula \(0.5(D_{xy} + 1)\). See predab.resample for information about confidence limits and for the list of resampling methods.

The LaTeX needspace package must be in effect to use the latex method.

# fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE)
# S3 method for class 'Rq'
validate(fit, method="boot", B=40,
         bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, 
         force=NULL, estimates=TRUE, pr=FALSE, u=NULL, rel=">",
         tolerance=1e-7, ...)

Arguments

fit: a fit derived by Rq. The options x=TRUE and y=TRUE must have been specified. See validate for a description of arguments method - pr.
method,B,bw,rule,type,sls,aics,force,estimates,pr: see validate and predab.resample and fastbw
u: If specifed, y is also dichotomized at the cutoff u for the purpose of getting a bias-corrected estimate of \(D_{xy}\).
rel: relationship for dichotomizing predicted y. Defaults to ">" to use y>u. rel can also be "<", ">=", and "<=".
tolerance: ignored
...: other arguments to pass to predab.resample, such as group, cluster, and subset

Value

matrix with rows corresponding to various indexes, and optionally \(D_{xy}\), and columns for the original index, resample estimates, indexes applied to whole or omitted sample using model derived from resample, average optimism, corrected index, and number of successful resamples.

Side Effects

prints a summary, and optionally statistics for each re-fit

Author

Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com

Examples

set.seed(1)
x1 <- runif(200)
x2 <- sample(0:3, 200, TRUE)
x3 <- rnorm(200)
distance <- (x1 + x2/3 + rnorm(200))^2

f <- Rq(sqrt(distance) ~ rcs(x1,4) + scored(x2) + x3, x=TRUE, y=TRUE)
#> Warning: Solution may be nonunique

#Validate full model fit (from all observations) but for x1 < .75
validate(f, B=20, subset=x1 < .75)   # normally B=300
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#>           index.orig training  test optimism index.corrected  n
#> MAD            0.618   0.6109 0.642  -0.0312           0.649 20
#> rho            0.254   0.2760 0.208   0.0677           0.186 20
#> g              0.198   0.2807 0.190   0.0906           0.107 20
#> Intercept      0.155   0.0973 0.295  -0.1979           0.353 20
#> Slope          0.815   0.8827 0.672   0.2107           0.604 20

#Validate stepwise model with typical (not so good) stopping rule
validate(f, B=20, bw=TRUE, rule="p", sls=.1, type="individual")
#> 
#> 		Backwards Step-down - Original Model
#> 
#>  Deleted Chi-Sq d.f. P      Residual d.f. P      AIC  
#>  x3      0.38   1    0.5382 0.38     1    0.5382 -1.62
#>  x2      3.24   3    0.3565 3.62     4    0.4605 -4.38
#>  x1      5.49   3    0.1391 9.11     7    0.2449 -4.89
#> 
#> Approximate Estimates after Deleting Factors
#> 
#>        Coef    S.E. Wald Z P
#> [1,] 0.9961 0.07502  13.28 0
#> 
#> Factors in Final Model
#> 
#> None
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: 1 non-positive fis
#> Warning: Solution may be nonunique
#> Warning: 1 non-positive fis
#> Warning: Solution may be nonunique
#> Warning: 2 non-positive fis
#> Warning: Solution may be nonunique
#> Warning: 2 non-positive fis
#> Warning: 2 non-positive fis
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: 1 non-positive fis
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: 1 non-positive fis
#> Warning: Solution may be nonunique
#> Warning: Solution may be nonunique
#> Warning: 4 non-positive fis
#> Warning: Solution may be nonunique
#> Warning: 4 non-positive fis
#>           index.orig training  test optimism index.corrected  n
#> MAD            0.688    0.615 0.666  -0.0513          0.7390 20
#> rho            0.000    0.326 0.259   0.0672         -0.0672 20
#> g              0.000    0.352 0.253   0.0994         -0.0994 20
#> Intercept      0.000    0.000 0.264  -0.2636          0.2636 20
#> Slope          1.000    1.000 0.757   0.2432          0.7568 20
#> 
#> Factors Retained in Backwards Elimination
#> 
#>  x1 x2 x3
#>  *  *    
#>  *  *    
#>  *     * 
#>  *  *    
#>  *  *    
#>  *       
#>  *  *  * 
#>  *  *  * 
#>  *  *    
#>     *    
#>  *  *    
#>     *  * 
#>  *  *  * 
#>  *       
#>  *  *  * 
#>  *       
#>  *       
#>  *       
#>  *       
#>  *  *  * 
#> 
#> Frequencies of Numbers of Factors Retained
#> 
#> 1 2 3 
#> 7 8 5