binom2.or.RdFits a Palmgren (bivariate odds-ratio model, or bivariate logistic regression) model to two binary responses. Actually, a bivariate logistic/probit/cloglog/cauchit model can be fitted. The odds ratio is used as a measure of dependency.
binom2.or(lmu = "logitlink", lmu1 = lmu, lmu2 = lmu, loratio = "loglink",
imu1 = NULL, imu2 = NULL, ioratio = NULL, zero = "oratio",
exchangeable = FALSE, tol = 0.001, more.robust = FALSE)Link function applied to the two marginal probabilities.
See Links for more choices.
See the note below.
Link function applied to the first and second of the two marginal probabilities.
Link function applied to the odds ratio.
See Links for more choices.
Optional initial values for the marginal probabilities and odds
ratio. See CommonVGAMffArguments for more details.
In general good initial values are often required so use these
arguments if convergence failure occurs.
Which linear/additive predictor is modelled as an intercept only?
The default is for the odds ratio.
A NULL means none.
See CommonVGAMffArguments for more details.
Logical.
If TRUE, the two marginal probabilities are constrained
to be equal.
Tolerance for testing independence. Should be some small positive numerical value.
Logical. If TRUE then some measures are taken to compute the
derivatives and working weights more robustly, i.e., in an attempt
to avoid numerical problems. Currently this feature is not debugged
if set TRUE.
Also known informally as the Palmgren model,
the bivariate logistic model is
a full-likelihood based model defined as two logistic regressions plus
log(oratio) = eta3 where eta3 is the third linear/additive
predictor relating the odds ratio to explanatory variables.
Explicitly, the default model is
$$logit[P(Y_j=1)] = \eta_j,\ \ \ j=1,2$$
for the marginals, and
$$\log[P(Y_{00}=1) P(Y_{11}=1) / (P(Y_{01}=1) P(Y_{10}=1))] = \eta_3,$$
specifies the dependency between the two responses. Here, the responses
equal 1 for a success and a 0 for a failure, and the odds ratio is often
written \(\psi=p_{00}p_{11}/(p_{10}p_{01})\).
The model is fitted by maximum likelihood estimation since the full
likelihood is specified.
The two binary responses are independent if and only if the odds ratio
is unity, or equivalently, the log odds ratio is 0. Fisher scoring
is implemented.
The default models \(\eta_3\) as a single parameter only,
i.e., an intercept-only model, but this can be circumvented by
setting zero = NULL in order to model the odds ratio as
a function of all the explanatory variables.
The function binom2.or() can handle other
probability link functions such as probitlink,
clogloglink and cauchitlink links
as well, so is quite general. In fact, the two marginal
probabilities can each have a different link function.
A similar model is the bivariate probit model
(binom2.rho), which is based on a standard
bivariate normal distribution, but the bivariate probit model
is less interpretable and flexible.
The exchangeable argument should be used when the error
structure is exchangeable, e.g., with eyes or ears data.
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm
and vgam.
When fitted, the fitted.values slot of the
object contains the four joint probabilities, labelled
as \((Y_1,Y_2)\) = (0,0), (0,1), (1,0), (1,1),
respectively. These estimated probabilities should be extracted
with the fitted generic function.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall.
le Cessie, S. and van Houwelingen, J. C. (1994). Logistic regression for correlated binary data. Applied Statistics, 43, 95–108.
Palmgren, J. (1989). Regression Models for Bivariate Binary Responses. Technical Report no. 101, Department of Biostatistics, University of Washington, Seattle.
Yee, T. W. and Dirnbock, T. (2009). Models for analysing species' presence/absence data at two time points. Journal of Theoretical Biology, 259(4), 684–694.
At present we call binom2.or families a
bivariate odds-ratio model.
The response should be either a 4-column matrix of counts
(whose columns correspond
to \((Y_1,Y_2)\) = (0,0), (0,1), (1,0),
(1,1) respectively), or a two-column matrix where each column
has two distinct values, or a factor with four levels.
The function rbinom2.or may be used to generate
such data. Successful convergence requires at least one case
of each of the four possible outcomes.
By default, a constant odds ratio is fitted because zero
= 3. Set zero = NULL if you want the odds ratio to be
modelled as a function of the explanatory variables; however,
numerical problems are more likely to occur.
The argument lmu, which is actually redundant, is used for
convenience and for upward compatibility: specifying lmu
only means the link function will be applied to lmu1
and lmu2. Users who want a different link function for
each of the two marginal probabilities should use the lmu1
and lmu2 arguments, and the argument lmu is then
ignored. It doesn't make sense to specify exchangeable =
TRUE and have different link functions for the two marginal
probabilities.
Regarding Yee and Dirnbock (2009),
the xij (see vglm.control) argument enables
environmental variables with different values at the two time
points to be entered into an exchangeable binom2.or
model. See the author's webpage for sample code.
# Fit the model in Table 6.7 in McCullagh and Nelder (1989)
coalminers <- transform(coalminers, Age = (age - 42) / 5)
fit <- vglm(cbind(nBnW, nBW, BnW, BW) ~ Age,
binom2.or(zero = NULL), data = coalminers)
fitted(fit)
#> 00 01 10 11
#> 1 0.9374725 0.04940880 0.004635613 0.008483108
#> 2 0.9146106 0.06363617 0.006975651 0.014777591
#> 3 0.8841071 0.08002862 0.010496469 0.025367860
#> 4 0.8439351 0.09748397 0.015813823 0.042767142
#> 5 0.7918821 0.11383853 0.023859791 0.070419602
#> 6 0.7257847 0.12591033 0.035968350 0.112336597
#> 7 0.6440431 0.13037828 0.053762296 0.171816282
#> 8 0.5467652 0.12560781 0.078375013 0.249252018
#> 9 0.4378039 0.11312565 0.108557133 0.340513348
summary(fit)
#>
#> Call:
#> vglm(formula = cbind(nBnW, nBW, BnW, BW) ~ Age, family = binom2.or(zero = NULL),
#> data = coalminers)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 -2.262468 0.029892 -75.688 < 2e-16 ***
#> (Intercept):2 -1.487760 0.020559 -72.364 < 2e-16 ***
#> (Intercept):3 3.021908 0.069732 43.336 < 2e-16 ***
#> Age:1 0.514510 0.012071 42.623 < 2e-16 ***
#> Age:2 0.325446 0.008869 36.697 < 2e-16 ***
#> Age:3 -0.131365 0.028442 -4.619 3.86e-06 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Names of linear predictors: logitlink(mu1), logitlink(mu2), loglink(oratio)
#>
#> Residual deviance: 30.3939 on 21 degrees of freedom
#>
#> Log-likelihood: -100.5292 on 21 degrees of freedom
#>
#> Number of Fisher scoring iterations: 4
#>
#> No Hauck-Donner effect found in any of the estimates
#>
coef(fit, matrix = TRUE)
#> logitlink(mu1) logitlink(mu2) loglink(oratio)
#> (Intercept) -2.2624682 -1.4877603 3.0219085
#> Age 0.5145103 0.3254455 -0.1313647
c(weights(fit, type = "prior")) * fitted(fit) # Table 6.8
#> 00 01 10 11
#> 1 1829.9463 96.44597 9.048717 16.55903
#> 2 1638.0676 113.97238 12.493391 26.46667
#> 3 1868.1182 169.10047 22.179039 53.60229
#> 4 2348.6713 271.29790 44.009870 119.02096
#> 5 1800.7399 258.86881 54.257166 160.13417
#> 6 1736.8028 301.30342 86.072262 268.82148
#> 7 1346.0502 272.49061 112.363199 359.09603
#> 8 956.8390 219.81367 137.156273 436.19103
#> 9 497.3452 128.51074 123.320903 386.82316
if (FALSE) with(coalminers, matplot(Age, fitted(fit), type = "l", las = 1,
xlab = "(age - 42) / 5", lwd = 2))
with(coalminers, matpoints(Age, depvar(fit), col=1:4))
#> Error in plot.xy(xy.coords(x, y), type = type, ...): plot.new has not been called yet
legend(x = -4, y = 0.5, lty = 1:4, col = 1:4, lwd = 2,
legend = c("1 = (Breathlessness=0, Wheeze=0)",
"2 = (Breathlessness=0, Wheeze=1)",
"3 = (Breathlessness=1, Wheeze=0)",
"4 = (Breathlessness=1, Wheeze=1)")) # \dontrun{}
#> Error in (function (s, units = "user", cex = NULL, font = NULL, vfont = NULL, ...) { if (!is.null(vfont)) vfont <- c(typeface = pmatch(vfont[1L], Hershey$typeface), fontindex = pmatch(vfont[2L], Hershey$fontindex)) .External.graphics(C_strWidth, as.graphicsAnnot(s), pmatch(units, c("user", "figure", "inches")), cex, font, vfont, ...)})(dots[[1L]][[1L]], cex = dots[[2L]][[1L]], font = dots[[3L]][[1L]], units = "user"): plot.new has not been called yet
# Another model: pet ownership
if (FALSE) data(xs.nz, package = "VGAMdata")
# More homogeneous:
petdata <- subset(xs.nz, ethnicity == "European" & age < 70 &
sex == "M")
#> Error: object 'xs.nz' not found
petdata <- na.omit(petdata[, c("cat", "dog", "age")])
#> Error: object 'petdata' not found
summary(petdata)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'petdata' not found
with(petdata, table(cat, dog)) # Can compute the odds ratio
#> Error: object 'petdata' not found
fit <- vgam(cbind((1-cat) * (1-dog), (1-cat) * dog,
cat * (1-dog), cat * dog) ~ s(age, df = 5),
binom2.or(zero = 3), data = petdata, trace = TRUE)
#> Error: object 'petdata' not found
colSums(depvar(fit))
#> nBnW nBW BnW BW
#> 6.7416488 0.9008492 0.3315192 1.0259828
coef(fit, matrix = TRUE)
#> logitlink(mu1) logitlink(mu2) loglink(oratio)
#> (Intercept) -2.2624682 -1.4877603 3.0219085
#> Age 0.5145103 0.3254455 -0.1313647
# \dontrun{}
if (FALSE) # Plot the estimated probabilities
ooo <- order(with(petdata, age))
matplot(with(petdata, age)[ooo], fitted(fit)[ooo, ], type = "l",
xlab = "Age", ylab = "Probability", main = "Pet ownership",
ylim = c(0, max(fitted(fit))), las = 1, lwd = 1.5)
#> Error: object 'petdata' not found
legend("topleft", col=1:4, lty = 1:4, leg = c("no cat or dog ",
"dog only", "cat only", "cat and dog"), lwd = 1.5) # \dontrun{}
#> Error in (function (s, units = "user", cex = NULL, font = NULL, vfont = NULL, ...) { if (!is.null(vfont)) vfont <- c(typeface = pmatch(vfont[1L], Hershey$typeface), fontindex = pmatch(vfont[2L], Hershey$fontindex)) .External.graphics(C_strWidth, as.graphicsAnnot(s), pmatch(units, c("user", "figure", "inches")), cex, font, vfont, ...)})(dots[[1L]][[1L]], cex = dots[[2L]][[1L]], font = dots[[3L]][[1L]], units = "user"): plot.new has not been called yet