Creates several new variables which help set up a dataset with an ordinal response variable \(y\) for use in fitting a forward continuation ratio (CR) model. The CR model can be fitted with binary logistic regression if each input observation is replicated the proper number of times according to the \(y\) value, a new binary \(y\) is computed that has at most one \(y=1\) per subject, and if a cohort variable is used to define the current qualifying condition for a cohort of subjects, e.g., \(y\geq 2\). cr.setup creates the needed auxilliary variables. See predab.resample and validate.lrm for information about validating CR models (e.g., using the bootstrap to sample with replacement from the original subjects instead of the records used in the fit, validating the model separately for user-specified values of cohort).

cr.setup(y)

Arguments

y

a character, numeric, category, or factor vector containing values of the response variable. For category or factor variables, the levels of the variable are assumed to be listed in an ordinal way.

Value

a list with components y, cohort, subs, reps. y is a new binary variable that is to be used in the binary logistic fit. cohort is a factor vector specifying which cohort condition currently applies. subs is a vector of subscripts that can be used to replicate other variables the same way y was replicated. reps specifies how many times each original observation was replicated. y, cohort, subs are all the same length and are longer than the original y vector. reps is the same length as the original y vector. The subs vector is suitable for passing to validate.lrm or calibrate, which pass this vector under the name cluster on to predab.resample so that bootstrapping can be done by sampling with replacement from the original subjects rather than from the individual records created by cr.setup.

Author

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

References

Berridge DM, Whitehead J: Analysis of failure time data with ordinal categories of response. Stat in Med 10:1703–1710, 1991.

See also

Examples

y <- c(NA, 10, 21, 32, 32)
cr.setup(y)
#> $y
#> [1] NA  1  0  1  0  0  0  0
#> 
#> $cohort
#> [1] <NA>  all   all   y>=21 all   y>=21 all   y>=21
#> Levels: all y>=21
#> 
#> $subs
#> [1] 1 2 3 3 4 4 5 5
#> 
#> $reps
#> [1] 1 1 2 2 2
#> 


set.seed(171)
y <- sample(0:2, 100, rep=TRUE)
sex <- sample(c("f","m"),100,rep=TRUE)
sex <- factor(sex)
table(sex, y)
#>    y
#> sex  0  1  2
#>   f 23 15 13
#>   m 12 23 14
options(digits=5)
tapply(y==0, sex, mean)
#>       f       m 
#> 0.45098 0.24490 
tapply(y==1, sex, mean)
#>       f       m 
#> 0.29412 0.46939 
tapply(y==2, sex, mean)
#>       f       m 
#> 0.25490 0.28571 
cohort <- y>=1
tapply(y[cohort]==1, sex[cohort], mean)
#>       f       m 
#> 0.53571 0.62162 

u <- cr.setup(y)
Y <- u$y
cohort <- u$cohort
sex <- sex[u$subs]

lrm(Y ~ cohort + sex)
#> Logistic Regression Model
#> 
#> lrm(formula = Y ~ cohort + sex)
#> 
#>                       Model Likelihood      Discrimination    Rank Discrim.    
#>                             Ratio Test             Indexes          Indexes    
#> Obs           165    LR chi2     10.32      R2       0.081    C       0.645    
#>  0             92    d.f.            2      R2(2,165)0.049    Dxy     0.291    
#>  1             73    Pr(> chi2) 0.0057    R2(2,122.1)0.066    gamma   0.382    
#> max |deriv| 5e-08                           Brier    0.232    tau-a   0.144    
#> 
#>             Coef    S.E.   Wald Z Pr(>|Z|)
#> Intercept   -0.4299 0.2586 -1.66  0.0964  
#> cohort=y>=1  1.0018 0.3317  3.02  0.0025  
#> sex=m       -0.3983 0.3263 -1.22  0.2223  
#> 
 
f <- lrm(Y ~ cohort*sex)   # saturated model - has to fit all data cells
f
#> Logistic Regression Model
#> 
#> lrm(formula = Y ~ cohort * sex)
#> 
#>                       Model Likelihood      Discrimination    Rank Discrim.    
#>                             Ratio Test             Indexes          Indexes    
#> Obs           165    LR chi2     14.03      R2       0.109    C       0.659    
#>  0             92    d.f.            3      R2(3,165)0.065    Dxy     0.317    
#>  1             73    Pr(> chi2) 0.0029    R2(3,122.1)0.086    gamma   0.417    
#> max |deriv| 1e-05                           Brier    0.226    tau-a   0.157    
#> 
#>                     Coef    S.E.   Wald Z Pr(>|Z|)
#> Intercept           -0.1967 0.2814 -0.70  0.4845  
#> cohort=y>=1          0.3398 0.4720  0.72  0.4716  
#> sex=m               -0.9293 0.4354 -2.13  0.0328  
#> cohort=y>=1 * sex=m  1.2826 0.6694  1.92  0.0553  
#> 

#Prob(y=0|female):
# plogis(-.50078)
#Prob(y=0|male):
# plogis(-.50078+.11301)
#Prob(y=1|y>=1, female):
plogis(-.50078+.31845)
#> [1] 0.45454
#Prob(y=1|y>=1, male):
plogis(-.50078+.31845+.11301-.07379)
#> [1] 0.46428

combinations <- expand.grid(cohort=levels(cohort), sex=levels(sex))
combinations
#>   cohort sex
#> 1    all   f
#> 2   y>=1   f
#> 3    all   m
#> 4   y>=1   m
p <- predict(f, combinations, type="fitted")
p
#>       1       2       3       4 
#> 0.45098 0.53571 0.24490 0.62162 
p0 <- p[c(1,3)]
p1 <- p[c(2,4)]
p1.unconditional <- (1 - p0) *p1
p1.unconditional
#>       1       3 
#> 0.29412 0.46939 
p2.unconditional <- 1 - p0 - p1.unconditional
p2.unconditional
#>       1       3 
#> 0.25490 0.28571 


if (FALSE) { # \dontrun{
dd <- datadist(inputdata)   # do this on non-replicated data
options(datadist='dd')
pain.severity <- inputdata$pain.severity
u <- cr.setup(pain.severity)
# inputdata frame has age, sex with pain.severity
attach(inputdata[u$subs,])  # replicate age, sex
# If age, sex already available, could do age <- age[u$subs] etc., or
# age <- rep(age, u$reps), etc.
y      <- u$y
cohort <- u$cohort
dd     <- datadist(dd, cohort)       # add to dd
f <- lrm(y ~ cohort + age*sex)       # ordinary cont. ratio model
g <- lrm(y ~ cohort*sex + age, x=TRUE,y=TRUE) # allow unequal slopes for
                                     # sex across cutoffs
cal <- calibrate(g, cluster=u$subs, subset=cohort=='all')  
# subs makes bootstrap sample the correct units, subset causes
# Predicted Prob(pain.severity=0) to be checked for calibration
} # }