cr.setup.RdCreates several new variables which help set up a dataset with an
ordinal response variable \(y\) for use in fitting a forward continuation
ratio (CR) model. The CR model can be fitted with binary logistic
regression if each input observation is replicated the proper
number of times according to the \(y\) value, a new binary \(y\)
is computed that has at most one \(y=1\) per subject,
and if a cohort variable is used to define the current
qualifying condition for a cohort of subjects, e.g., \(y\geq 2\).
cr.setup creates the needed auxilliary variables. See
predab.resample and validate.lrm for information about
validating CR models (e.g., using the bootstrap to sample with
replacement from the original subjects instead of the records used in
the fit, validating the model separately for user-specified values of
cohort).
cr.setup(y)a list with components y, cohort, subs, reps. y is a new binary
variable that is to be used in the binary logistic fit. cohort is
a factor vector specifying which cohort condition currently applies.
subs is a vector of subscripts that can be used to replicate other
variables the same way y was replicated. reps specifies how many
times each original observation was replicated. y, cohort, subs are
all the same length and are longer than the original y vector.
reps is the same length as the original y vector.
The subs vector is suitable for passing to validate.lrm or calibrate,
which pass this vector under the name cluster on to predab.resample so that bootstrapping can be
done by sampling with replacement from the original subjects rather than
from the individual records created by cr.setup.
Berridge DM, Whitehead J: Analysis of failure time data with ordinal categories of response. Stat in Med 10:1703–1710, 1991.
y <- c(NA, 10, 21, 32, 32)
cr.setup(y)
#> $y
#> [1] NA 1 0 1 0 0 0 0
#>
#> $cohort
#> [1] <NA> all all y>=21 all y>=21 all y>=21
#> Levels: all y>=21
#>
#> $subs
#> [1] 1 2 3 3 4 4 5 5
#>
#> $reps
#> [1] 1 1 2 2 2
#>
set.seed(171)
y <- sample(0:2, 100, rep=TRUE)
sex <- sample(c("f","m"),100,rep=TRUE)
sex <- factor(sex)
table(sex, y)
#> y
#> sex 0 1 2
#> f 23 15 13
#> m 12 23 14
options(digits=5)
tapply(y==0, sex, mean)
#> f m
#> 0.45098 0.24490
tapply(y==1, sex, mean)
#> f m
#> 0.29412 0.46939
tapply(y==2, sex, mean)
#> f m
#> 0.25490 0.28571
cohort <- y>=1
tapply(y[cohort]==1, sex[cohort], mean)
#> f m
#> 0.53571 0.62162
u <- cr.setup(y)
Y <- u$y
cohort <- u$cohort
sex <- sex[u$subs]
lrm(Y ~ cohort + sex)
#> Logistic Regression Model
#>
#> lrm(formula = Y ~ cohort + sex)
#>
#> Model Likelihood Discrimination Rank Discrim.
#> Ratio Test Indexes Indexes
#> Obs 165 LR chi2 10.32 R2 0.081 C 0.645
#> 0 92 d.f. 2 R2(2,165)0.049 Dxy 0.291
#> 1 73 Pr(> chi2) 0.0057 R2(2,122.1)0.066 gamma 0.382
#> max |deriv| 5e-08 Brier 0.232 tau-a 0.144
#>
#> Coef S.E. Wald Z Pr(>|Z|)
#> Intercept -0.4299 0.2586 -1.66 0.0964
#> cohort=y>=1 1.0018 0.3317 3.02 0.0025
#> sex=m -0.3983 0.3263 -1.22 0.2223
#>
f <- lrm(Y ~ cohort*sex) # saturated model - has to fit all data cells
f
#> Logistic Regression Model
#>
#> lrm(formula = Y ~ cohort * sex)
#>
#> Model Likelihood Discrimination Rank Discrim.
#> Ratio Test Indexes Indexes
#> Obs 165 LR chi2 14.03 R2 0.109 C 0.659
#> 0 92 d.f. 3 R2(3,165)0.065 Dxy 0.317
#> 1 73 Pr(> chi2) 0.0029 R2(3,122.1)0.086 gamma 0.417
#> max |deriv| 1e-05 Brier 0.226 tau-a 0.157
#>
#> Coef S.E. Wald Z Pr(>|Z|)
#> Intercept -0.1967 0.2814 -0.70 0.4845
#> cohort=y>=1 0.3398 0.4720 0.72 0.4716
#> sex=m -0.9293 0.4354 -2.13 0.0328
#> cohort=y>=1 * sex=m 1.2826 0.6694 1.92 0.0553
#>
#Prob(y=0|female):
# plogis(-.50078)
#Prob(y=0|male):
# plogis(-.50078+.11301)
#Prob(y=1|y>=1, female):
plogis(-.50078+.31845)
#> [1] 0.45454
#Prob(y=1|y>=1, male):
plogis(-.50078+.31845+.11301-.07379)
#> [1] 0.46428
combinations <- expand.grid(cohort=levels(cohort), sex=levels(sex))
combinations
#> cohort sex
#> 1 all f
#> 2 y>=1 f
#> 3 all m
#> 4 y>=1 m
p <- predict(f, combinations, type="fitted")
p
#> 1 2 3 4
#> 0.45098 0.53571 0.24490 0.62162
p0 <- p[c(1,3)]
p1 <- p[c(2,4)]
p1.unconditional <- (1 - p0) *p1
p1.unconditional
#> 1 3
#> 0.29412 0.46939
p2.unconditional <- 1 - p0 - p1.unconditional
p2.unconditional
#> 1 3
#> 0.25490 0.28571
if (FALSE) { # \dontrun{
dd <- datadist(inputdata) # do this on non-replicated data
options(datadist='dd')
pain.severity <- inputdata$pain.severity
u <- cr.setup(pain.severity)
# inputdata frame has age, sex with pain.severity
attach(inputdata[u$subs,]) # replicate age, sex
# If age, sex already available, could do age <- age[u$subs] etc., or
# age <- rep(age, u$reps), etc.
y <- u$y
cohort <- u$cohort
dd <- datadist(dd, cohort) # add to dd
f <- lrm(y ~ cohort + age*sex) # ordinary cont. ratio model
g <- lrm(y ~ cohort*sex + age, x=TRUE,y=TRUE) # allow unequal slopes for
# sex across cutoffs
cal <- calibrate(g, cluster=u$subs, subset=cohort=='all')
# subs makes bootstrap sample the correct units, subset causes
# Predicted Prob(pain.severity=0) to be checked for calibration
} # }