quasibin.RdThe function fits the generalized linear model “II” proposed by Williams (1982) accounting for overdispersion in clustered binomial data \((n, y)\).
quasibin(formula, data, link = c("logit", "cloglog"), phi = NULL, tol = 0.001)A formula for the fixed effects. The left-hand side of the formula must be of the form
cbind(y, n - y) where the modelled probability is y/n.
The link function for the mean \(p\): “logit” or “cloglog”.
A data frame containing the response (n and y) and explanatory variable(s).
When phi is NULL (the default), the overdispersion parameter \(\phi\) is estimated from the data.
Otherwise, its value is considered as fixed.
A positive scalar (default to 0.001). The algorithm stops at iteration \(r + 1\) when the condition \(\chi{^2}[r+1] - \chi{^2}[r] <= tol\) is met by the \(\chi^2\) statistics .
For a given cluster \((n, y)\), the model is:
$$y~|~\lambda \sim Binomial(n,~\lambda)$$
with \(\lambda\) a random variable of mean \(E[\lambda] = p\)
and variance \(Var[\lambda] = \phi * p * (1 - p)\).
The marginal mean and variance are:
$$E[y] = p$$
$$Var[y] = p * (1 - p) * [1 + (n - 1) * \phi]$$
The overdispersion parameter \(\phi\) corresponds to the intra-cluster correlation coefficient,
which is here restricted to be positive.
The function uses the function glm and the parameterization: \(p = h(X b) = h(\eta)\), where \(h\) is the
inverse of a given link function, \(X\) is a design-matrix, \(b\) is a vector of fixed effects and \(\eta = X b\)
is the linear predictor.
The estimate of \(b\) maximizes the quasi log-likelihood of the marginal model.
The parameter \(\phi\) is estimated with the moment method or can be set to a constant
(a regular glim is fitted when \(\phi\) is set to zero). The literature recommends to estimate \(\phi\)
from the saturated model. Several explanatory variables are allowed in \(b\). None is allowed in \(\phi\).
An object of formal class “glimQL”: see glimQL-class for details.
Moore, D.F., 1987, Modelling the extraneous variance in the presence of extra-binomial variation.
Appl. Statist. 36, 8-14.
Williams, D.A., 1982, Extra-binomial variation in logistic linear models. Appl. Statist. 31, 144-148.
glm, geese in the contributed package geepack,
glm.binomial.disp in the contributed package dispmod.
data(orob2)
fm1 <- glm(cbind(y, n - y) ~ seed * root,
family = binomial, data = orob2)
fm2 <- quasibin(cbind(y, n - y) ~ seed * root,
data = orob2, phi = 0)
fm3 <- quasibin(cbind(y, n - y) ~ seed * root,
data = orob2)
rbind(fm1 = coef(fm1), fm2 = coef(fm2), fm3 = coef(fm3))
#> (Intercept) seedO75 rootCUCUMBER seedO75:rootCUCUMBER
#> fm1 -0.4122448 -0.14592695 0.5400782 0.7781037
#> fm2 -0.4122448 -0.14592695 0.5400782 0.7781037
#> fm3 -0.4653218 -0.07008965 0.5102360 0.8195562
# show the model
fm3
#> Quasi-likelihood generalized linear model
#> -----------------------------------------
#> quasibin(formula = cbind(y, n - y) ~ seed * root, data = orob2)
#>
#> Fixed-effect coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.4653 0.2439 -1.9081 0.0564
#> seedO75 -0.0701 0.3115 -0.2250 0.8219
#> rootCUCUMBER 0.5102 0.3347 1.5244 0.1274
#> seedO75:rootCUCUMBER 0.8196 0.4352 1.8831 0.0597
#>
#> Overdispersion parameter:
#> phi
#> 0.0249
#>
#> Pearson's chi-squared goodness-of-fit statistic = 17.0007
#>
# dispersion parameter and goodness-of-fit statistic
c(phi = fm3@phi,
X2 = sum(residuals(fm3, type = "pearson")^2))
#> phi X2
#> 0.02493559 17.00065130
# model predictions
predfm1 <- predict(fm1, type = "response", se = TRUE)
predfm3 <- predict(fm3, type = "response", se = TRUE)
New <- expand.grid(seed = levels(orob2$seed),
root = levels(orob2$root))
predict(fm3, New, se = TRUE, type = "response")
#> $fit
#> 1 2 3 4
#> 0.3857241 0.3692557 0.5112267 0.6887712
#>
#> $se.fit
#> 1 2 3 4
#> 0.05778166 0.04512309 0.05728607 0.04278711
#>
#> $residual.scale
#> [1] 1
#>
data.frame(orob2, p1 = predfm1$fit,
se.p1 = predfm1$se.fit,
p3 = predfm3$fit,
se.p3 = predfm3$se.fit)
#> seed root n y p1 se.p1 p3 se.p3
#> 1 O75 BEAN 39 10 0.3639706 0.02917342 0.3692557 0.04512309
#> 2 O75 BEAN 62 23 0.3639706 0.02917342 0.3692557 0.04512309
#> 3 O75 BEAN 81 23 0.3639706 0.02917342 0.3692557 0.04512309
#> 4 O75 BEAN 51 26 0.3639706 0.02917342 0.3692557 0.04512309
#> 5 O75 BEAN 39 17 0.3639706 0.02917342 0.3692557 0.04512309
#> 6 O75 CUCUMBER 6 5 0.6813559 0.02712870 0.6887712 0.04278711
#> 7 O75 CUCUMBER 74 53 0.6813559 0.02712870 0.6887712 0.04278711
#> 8 O75 CUCUMBER 72 55 0.6813559 0.02712870 0.6887712 0.04278711
#> 9 O75 CUCUMBER 51 32 0.6813559 0.02712870 0.6887712 0.04278711
#> 10 O75 CUCUMBER 79 46 0.6813559 0.02712870 0.6887712 0.04278711
#> 11 O75 CUCUMBER 13 10 0.6813559 0.02712870 0.6887712 0.04278711
#> 12 O73 BEAN 16 8 0.3983740 0.04414243 0.3857241 0.05778166
#> 13 O73 BEAN 30 10 0.3983740 0.04414243 0.3857241 0.05778166
#> 14 O73 BEAN 28 8 0.3983740 0.04414243 0.3857241 0.05778166
#> 15 O73 BEAN 45 23 0.3983740 0.04414243 0.3857241 0.05778166
#> 16 O73 BEAN 4 0 0.3983740 0.04414243 0.3857241 0.05778166
#> 17 O73 CUCUMBER 12 3 0.5319149 0.04202173 0.5112267 0.05728607
#> 18 O73 CUCUMBER 41 22 0.5319149 0.04202173 0.5112267 0.05728607
#> 19 O73 CUCUMBER 30 15 0.5319149 0.04202173 0.5112267 0.05728607
#> 20 O73 CUCUMBER 51 32 0.5319149 0.04202173 0.5112267 0.05728607
#> 21 O73 CUCUMBER 7 3 0.5319149 0.04202173 0.5112267 0.05728607
fm4 <- quasibin(cbind(y, n - y) ~ seed + root,
data = orob2, phi = fm3@phi)
# Pearson's chi-squared goodness-of-fit statistic
# compare with fm3's X2
sum(residuals(fm4, type = "pearson")^2)
#> [1] 20.72527