validate.cph.RdThis is the version of the validate function specific to models
fitted with cph or psm. Also included is a small
function dxy.cens that retrieves \(D_{xy}\) and its
standard error from the survival package's
concordancefit function. This allows for incredibly fast
computation of \(D_{xy}\) or the c-index even for hundreds of
thousands of observations. dxy.cens negates \(D_{xy}\)
if log relative hazard is being predicted. If y is a
left-censored Surv object, times are negated and a
right-censored object is created, then \(D_{xy}\) is negated.
See predab.resample for information about confidence limits.
# fit <- cph(formula=Surv(ftime,event) ~ terms, x=TRUE, y=TRUE, \dots)
# S3 method for class 'cph'
validate(fit, method="boot", B=40, bw=FALSE, rule="aic",
type="residual", sls=.05, aics=0, force=NULL, estimates=TRUE,
pr=FALSE, dxy=TRUE, u, tol=1e-9, ...)
# S3 method for class 'psm'
validate(fit, method="boot",B=40,
bw=FALSE, rule="aic", type="residual", sls=.05, aics=0,
force=NULL, estimates=TRUE, pr=FALSE,
dxy=TRUE, tol=1e-12, rel.tolerance=1e-5, maxiter=15, ...)
dxy.cens(x, y, type=c('time','hazard'))a fit derived cph. The options x=TRUE and y=TRUE
must have been specified. If the model contains any stratification factors
and dxy=TRUE,
the options surv=TRUE and time.inc=u must also have been given,
where u is the same value of u given to validate.
see validate
number of repetitions. For method="crossvalidation", is the
number of groups of omitted observations.
TRUE to do fast step-down using the fastbw function,
for both the overall model and for each repetition. fastbw
keeps parameters together that represent the same factor.
Applies if bw=TRUE. "aic" to use Akaike's information criterion as a
stopping rule (i.e., a factor is deleted if the \(\chi^2\) falls below
twice its degrees of freedom), or "p" to use \(P\)-values.
"residual" or "individual" - stopping rule is for
individual factors or for the residual \(\chi^2\) for
all variables deleted. For dxy.cens, specify
type="hazard" if x is on the hazard or cumulative
hazard (or their logs) scale, causing negation of the correlation index.
significance level for a factor to be kept in a model, or for judging the residual \(\chi^2\).
cutoff on AIC when rule="aic".
see fastbw
see print.fastbw
TRUE to print results of each repetition
see validate or predab.resample
set to TRUE to validate Somers' \(D_{xy}\) using
dxy.cens, which is fast until n > 500,000. Uses the
survival package's concordancefit service
function for concordance.
must be specified if the model has any stratification factors and
dxy=TRUE.
In that case, strata are not included in \(X\beta\) and the
survival curves may cross. Predictions at time t=u are
correlated with observed survival times. Does not apply to
validate.psm.
a numeric vector
a Surv object that may be uncensored or
right-censored
Statistics validated include the Nagelkerke \(R^2\),
\(D_{xy}\), slope shrinkage, the discrimination index \(D\)
[(model L.R. \(\chi^2\) - 1)/L], the unreliability index
\(U\) = (difference in -2 log likelihood between uncalibrated
\(X\beta\) and
\(X\beta\) with overall slope calibrated to test sample) / L,
and the overall quality index \(Q = D - U\). \(g\) is the
\(g\)-index on the log relative hazard (linear predictor) scale.
L is -2 log likelihood with beta=0. The "corrected" slope
can be thought of as shrinkage factor that takes into account overfitting.
See predab.resample for the list of resampling methods.
matrix with rows corresponding to \(D_{xy}\), Slope, \(D\),
\(U\), and \(Q\), and columns for the original index, resample estimates,
indexes applied to whole or omitted sample using model derived from
resample, average optimism, corrected index, and number of successful
resamples.
The values corresponding to the row \(D_{xy}\) are equal to \(2 * (C - 0.5)\) where C is the C-index or concordance probability. If the user is correlating the linear predictor (predicted log hazard) with survival time, \(D_{xy}\) is automatically negated.
prints a summary, and optionally statistics for each re-fit (if
pr=TRUE)
require(survival)
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, TRUE))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
S <- Surv(dt,e)
f <- cph(S ~ age*sex, x=TRUE, y=TRUE)
# Validate full model fit
validate(f, B=10) # normally B=150
#> index.orig training test optimism index.corrected n
#> Dxy 0.3852 0.3937 0.3786 0.0150 0.3701 10
#> R2 0.0811 0.0903 0.0778 0.0125 0.0686 10
#> Slope 1.0000 1.0000 0.9130 0.0870 0.9130 10
#> D 0.0312 0.0342 0.0298 0.0044 0.0268 10
#> U -0.0008 -0.0008 0.0005 -0.0013 0.0005 10
#> Q 0.0320 0.0350 0.0293 0.0057 0.0263 10
#> g 0.7388 0.7924 0.7228 0.0696 0.6692 10
# Validate a model with stratification. Dxy is the only
# discrimination measure for such models, by Dxy requires
# one to choose a single time at which to predict S(t|X)
f <- cph(S ~ rcs(age)*strat(sex),
x=TRUE, y=TRUE, surv=TRUE, time.inc=2)
#> number of knots in rcs defaulting to 5
validate(f, u=2, B=10) # normally B=150
#> index.orig training test optimism index.corrected n
#> Dxy 0.3491 0.3932 0.3578 0.0355 0.3137 10
#> R2 0.0759 0.0859 0.0695 0.0164 0.0595 10
#> Slope 1.0000 1.0000 0.9073 0.0927 0.9073 10
#> D 0.0317 0.0361 0.0289 0.0071 0.0246 10
#> U -0.0009 -0.0009 0.0007 -0.0016 0.0007 10
#> Q 0.0326 0.0370 0.0282 0.0088 0.0239 10
#> g 0.6587 1.8035 1.6442 0.1593 0.4994 10
# Note u=time.inc