survdiff.Rd
Tests if there is a difference between two or more survival curves using the \(G^\rho\) family of tests, or for a single curve against a known alternative.
survdiff(formula, data, subset, na.action, rho=0, timefix=TRUE)
a formula expression as for other survival models, of the form
Surv(time, status) ~ predictors
. For a one-sample test, the predictors
must consist of a single offset(sp)
term, where sp
is a vector giving the
survival probability of each subject. For a k-sample test, each unique
combination of predictors defines a subgroup.
A strata
term may be used to produce a stratified test.
To cause missing values in the predictors to be treated as a separate
group, rather than being omitted, use the strata
function with its
na.group=T
argument.
an optional data frame in which to interpret the variables occurring in the formula.
expression indicating which subset of the rows of data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), a numeric vector indicating which observation numbers are to be included (or excluded if negative), or a character vector of row names to be included. All observations are included by default.
a missing-data filter function. This is applied to the model.frame
after any
subset argument has been used. Default is options()$na.action
.
a scalar parameter that controls the type of test.
process times through the aeqSurv
function to
eliminate potential roundoff issues.
a list with components:
the number of subjects in each group.
the weighted observed number of events in each group. If there are strata, this will be a matrix with one column per stratum.
the weighted expected number of events in each group. If there are strata, this will be a matrix with one column per stratum.
the chisquare statistic for a test of equality.
the variance matrix of the test.
optionally, the number of subjects contained in each stratum.
the p-value corresponding to the Chisquare statistic
This function implements the G-rho family of
Harrington and Fleming (1982), with weights on each death of \(S(t)^\rho\),
where \(S(t)\) is the Kaplan-Meier estimate of survival.
With rho = 0
this is the log-rank or Mantel-Haenszel test,
and with rho = 1
it is equivalent to the Peto & Peto modification
of the Gehan-Wilcoxon test.
Peto and Peto show that the Gehan-Wilcoxon test can be badly biased if the two groups have different censoring patterns, and proposed an alternative. Prentice and Marek later showed an actual example where this issue occurs. For most data sets the Gehan-Wilcoxon and Peto-Peto-Prentice variant will hardly differ, however.
If the right hand side of the formula consists only of an offset term,
then a one sample test is done.
To cause missing values in the predictors to be treated as a separate
group, rather than being omitted, use the factor
function with its
exclude
argument to recode the righ-hand-side covariate.
Note that the ordinary log-rank test is equivalent to the score test from a Cox model, using the Breslow approximation for ties. Use the Cox model form for more complex models, e.g., time-dependent covariates.
Harrington, D. P. and Fleming, T. R. (1982). A class of rank test procedures for censored survival data. Biometrika, 553-566.
Peto R. Peto and Peto, J. (1972) Asymptotically efficient rank invariant test procedures (with discussion), JRSSA, 185-206.
Prentice, R. and Marek, P. (1979) A qualitative discrepancy between censored data rank tests, Biometics, 861–867.
## Two-sample test
survdiff(Surv(futime, fustat) ~ rx,data=ovarian)
#> Call:
#> survdiff(formula = Surv(futime, fustat) ~ rx, data = ovarian)
#>
#> N Observed Expected (O-E)^2/E (O-E)^2/V
#> rx=1 13 7 5.23 0.596 1.06
#> rx=2 13 5 6.77 0.461 1.06
#>
#> Chisq= 1.1 on 1 degrees of freedom, p= 0.3
check <- coxph(Surv(futime, fustat) ~ rx, data=ovarian, ties="breslow")
round(summary(check)$sctest, 3)
#> test df pvalue
#> 1.063 1.000 0.303
## Stratified 8-sample test (7 df)
survdiff(Surv(time, status) ~ pat.karno + strata(inst), data=lung)
#> Call:
#> survdiff(formula = Surv(time, status) ~ pat.karno + strata(inst),
#> data = lung)
#>
#> n=224, 4 observations deleted due to missingness.
#>
#> N Observed Expected (O-E)^2/E (O-E)^2/V
#> pat.karno=30 2 1 0.692 0.13720 0.15752
#> pat.karno=40 2 1 1.099 0.00889 0.00973
#> pat.karno=50 4 4 1.166 6.88314 7.45359
#> pat.karno=60 30 27 16.298 7.02790 9.57333
#> pat.karno=70 41 31 26.358 0.81742 1.14774
#> pat.karno=80 50 38 41.938 0.36978 0.60032
#> pat.karno=90 60 38 47.242 1.80800 3.23078
#> pat.karno=100 35 21 26.207 1.03451 1.44067
#>
#> Chisq= 21.4 on 7 degrees of freedom, p= 0.003
check <- coxph(Surv(time, status) ~ factor(pat.karno) + strata(inst), lung)
round(summary(check)$sctest, 3)
#> test df pvalue
#> 21.368 7.000 0.003
## Expected survival for heart transplant patients based on
## US mortality tables
expect <- survexp(futime ~ 1, data=jasa, cohort=FALSE,
rmap= list(age=(accept.dt - birth.dt), sex=1, year=accept.dt),
ratetable=survexp.us)
## actual survival is much worse (no surprise)
survdiff(Surv(jasa$futime, jasa$fustat) ~ offset(expect))
#> Call:
#> survdiff(formula = Surv(jasa$futime, jasa$fustat) ~ offset(expect))
#>
#> Observed Expected Z p
#> 75.000 0.644 -92.681 0.000
# The free light chain data set is close to the population.
e2 <- survexp(futime ~ 1, data=flchain, cohort=FALSE,
rmap= list(age= age*365.25, sex=sex,
year=as.Date(paste0(sample.yr, "-07-01"))),
ratetable= survexp.mn)
survdiff(Surv(futime, death) ~ offset(e2), flchain)
#> Call:
#> survdiff(formula = Surv(futime, death) ~ offset(e2), data = flchain)
#>
#> Observed Expected Z p
#> 2169.0000 2076.8776 -2.0214 0.0432