betabin.RdFits a beta-binomial generalized linear model accounting for overdispersion in clustered binomial data \((n, y)\).
A formula for the fixed effects b. The left-hand side of the formula must be of the form
cbind(y, n - y) where the modelled probability is y/n.
A right-hand formula for the overdispersion parameter(s) \(\phi\).
The link function for the mean \(p\): “logit” or “cloglog”.
A data frame containing the response (n and y) and explanatory variable(s).
Initial values for the overdispersion parameter(s) \(\phi\). Default to 0.1.
Logical to control the printing of warnings occurring during log-likelihood maximization.
Default to FALSE (no printing).
A function name: which action should be taken in the case of missing value(s).
A list with 2 components (scalars or vectors) of the same size, indicating which parameters are
fixed (i.e., not optimized) in the global parameter vector \((b, \phi)\) and the corresponding fixed values.
For example, fixpar = list(c(4, 5), c(0, 0)) means that 4th and 5th parameters of the model are set to 0.
A logical. When set to FALSE, the hessian and the variances-covariances matrices of the
parameters are not computed.
A list to control the optimization parameters. See optim. By default, set the maximum number of iterations to 2000.
Further arguments passed to optim.
For a given cluster \((n, y)\), the model is:
$$y~|~\lambda \sim Binomial(n,~\lambda)$$
with \(\lambda\) following a Beta distribution \(Beta(a1,~a2)\).
If \(B\) denotes the beta function, then:
$$P(\lambda) = \frac{\lambda^{a1~-~1} * (1~-~\lambda)^{a2 - 1}}{B(a1,~a2)}$$
$$E[\lambda] = \frac{a1}{a1 + a2}$$
$$Var[\lambda] = \frac{a1 * a2}{(a1 + a2 + 1) * (a1 + a2)^2}$$
The marginal beta-binomial distribution is:
$$P(y) = \frac{C(n,~y) * B(a1 + y, a2 + n - y)}{B(a1,~a2)}$$
The function uses the parameterization \(p = \frac{a1}{a1 + a2} = h(X b) = h(\eta)\) and \(\phi = \frac{1}{a1 + a2 + 1}\),
where \(h\) is the inverse of the link function (logit or complementary log-log), \(X\) is a design-matrix, \(b\)
is a vector of fixed effects, \(\eta = X b\) is the linear predictor and \(\phi\) is the overdispersion
parameter (i.e., the intracluster correlation coefficient, which is here restricted to be positive).
The marginal mean and variance are:
$$E[y] = n * p$$
$$Var[y] = n * p * (1 - p) * [1 + (n - 1) * \phi]$$
The parameters \(b\) and \(\phi\) are estimated by maximizing the log-likelihood of the marginal model (using the
function optim). Several explanatory variables are allowed in \(b\), only one in \(\phi\).
An object of formal class “glimML”: see glimML-class for details.
Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
Griffiths, D.A., 1973. Maximum likelihood estimation for the beta-binomial distribution and an application
to the household distribution of the total number of cases of disease. Biometrics 29, 637-648.
Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of
correlation induced by covariate measurement errors. J.A.S.A. 81, 321-327.
Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving
reproduction and teratogenicity. Biometrics 31, 949-952.
glimML-class, glm and optim
data(orob2)
fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2)
fm2 <- betabin(cbind(y, n - y) ~ seed + root, ~ 1, data = orob2)
fm3 <- betabin(cbind(y, n - y) ~ seed * root, ~ 1, data = orob2)
# show the model
fm1; fm2; fm3
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed, random = ~1, data = orob2)
#>
#> Convergence was obtained after 80 iterations.
#>
#> Fixed-effect coefficients:
#> Estimate Std. Error z value Pr(> |z|)
#> (Intercept) -2.602e-01 2.262e-01 -1.15e+00 2.500e-01
#> seedO75 4.130e-01 2.993e-01 1.38e+00 1.676e-01
#>
#> Overdispersion coefficients:
#> Estimate Std. Error z value Pr(> z)
#> phi.(Intercept) 7.805e-02 3.015e-02 2.589e+00 4.82e-03
#>
#> Log-likelihood statistics
#> Log-lik nbpar df res. Deviance AIC AICc
#> -6.355e+01 3 18 5.051e+01 1.331e+02 1.345e+02
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed + root, random = ~1,
#> data = orob2)
#>
#> Convergence was obtained after 167 iterations.
#>
#> Fixed-effect coefficients:
#> Estimate Std. Error z value Pr(> |z|)
#> (Intercept) -7.280e-01 1.995e-01 -3.649e+00 2.634e-04
#> seedO75 3.427e-01 2.106e-01 1.627e+00 1.038e-01
#> rootCUCUMBER 1.011e+00 2.020e-01 5.004e+00 5.625e-07
#>
#> Overdispersion coefficients:
#> Estimate Std. Error z value Pr(> z)
#> phi.(Intercept) 1.937e-02 1.416e-02 1.368e+00 8.569e-02
#>
#> Log-likelihood statistics
#> Log-lik nbpar df res. Deviance AIC AICc
#> -5.583e+01 4 17 3.507e+01 1.197e+02 1.222e+02
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed * root, random = ~1,
#> data = orob2)
#>
#> Convergence was obtained after 196 iterations.
#>
#> Fixed-effect coefficients:
#> Estimate Std. Error z value Pr(> |z|)
#> (Intercept) -4.456e-01 2.183e-01 -2.041e+00 4.124e-02
#> seedO75 -9.612e-02 2.737e-01 -3.512e-01 7.255e-01
#> rootCUCUMBER 5.235e-01 2.968e-01 1.764e+00 7.780e-02
#> seedO75:rootCUCUMBER 7.962e-01 3.779e-01 2.107e+00 3.514e-02
#>
#> Overdispersion coefficients:
#> Estimate Std. Error z value Pr(> z)
#> phi.(Intercept) 1.236e-02 1.131e-02 1.093e+00 1.373e-01
#>
#> Log-likelihood statistics
#> Log-lik nbpar df res. Deviance AIC AICc
#> -5.377e+01 5 16 3.094e+01 1.175e+02 1.215e+02
# AIC
AIC(fm1, fm2, fm3)
#> df AIC AICc
#> fm1 3 133.1054 134.5172
#> fm2 4 119.6637 122.1637
#> fm3 5 117.5336 121.5336
summary(AIC(fm1, fm2, fm3), which = "AICc")
#> df AICc Delta W Cum.W
#> fm3 5 121.5336 0.0000000 0.5776126568 0.5776127
#> fm2 4 122.1637 0.6301115 0.4215117935 0.9991245
#> fm1 3 134.5172 12.9836137 0.0008755497 1.0000000
# Wald test for root effect
wald.test(b = coef(fm3), Sigma = vcov(fm3), Terms = 3:4)
#> Wald test:
#> ----------
#>
#> Chi-squared test:
#> X2 = 34.9, df = 2, P(> X2) = 2.6e-08
# likelihood ratio test for root effect
anova(fm1, fm3)
#> Analysis of Deviance Table (beta-binomial models)
#>
#> fm1: fixed = cbind(y, n - y) ~ seed; random = ~1
#> fm3: fixed = cbind(y, n - y) ~ seed * root; random = ~1
#>
#> logL k AIC AICc BIC Resid. dev. Resid. Df Test Deviance Df
#> fm1 -63.55 3 133.1 134.5 136.2 50.51 18
#> fm3 -53.77 5 117.5 121.5 122.8 30.94 16 fm1-fm3 19.57 2
#> P(> Chi2)
#> fm1
#> fm3 5.624e-05
# model predictions
New <- expand.grid(seed = levels(orob2$seed),
root = levels(orob2$root))
data.frame(New, predict(fm3, New, se = TRUE, type = "response"))
#> seed root fit se.fit
#> 1 O73 BEAN 0.3904148 0.05195385
#> 2 O75 BEAN 0.3677958 0.03808118
#> 3 O73 CUCUMBER 0.5194646 0.05118758
#> 4 O75 CUCUMBER 0.6852506 0.03616440
# Djallonke sheep data
data(dja)
betabin(cbind(y, n - y) ~ group, ~ 1, dja)
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~1, data = dja)
#>
#> Convergence was obtained after 68 iterations.
#>
#> Fixed-effect coefficients:
#> Estimate Std. Error z value Pr(> |z|)
#> (Intercept) -5.846e-01 1.861e-01 -3.141e+00 1.685e-03
#> groupTREAT -8.522e-01 2.527e-01 -3.372e+00 7.471e-04
#>
#> Overdispersion coefficients:
#> Estimate Std. Error z value Pr(> z)
#> phi.(Intercept) 5.602e-02 3.013e-02 1.859e+00 3.149e-02
#>
#> Log-likelihood statistics
#> Log-lik nbpar df res. Deviance AIC AICc
#> -1.071e+02 3 72 1.084e+02 2.202e+02 2.205e+02
# heterogeneous phi
betabin(cbind(y, n - y) ~ group, ~ group, dja,
control = list(maxit = 1000))
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja,
#> control = list(maxit = 1000))
#>
#> Convergence was obtained after 567 iterations.
#>
#> Fixed-effect coefficients:
#> Estimate Std. Error z value Pr(> |z|)
#> (Intercept) -6.076e-01 2.254e-01 -2.695e+00 7.028e-03
#> groupTREAT -8.425e-01 2.648e-01 -3.181e+00 1.465e-03
#>
#> Overdispersion coefficients:
#> Estimate Std. Error z value Pr(> z)
#> phi.groupCTRL 1.642e-01 7.443e-02 2.206e+00 1.369e-02
#> phi.groupTREAT 4.249e-09 2.000e-13 0.000e+00 1.000e+00
#>
#> Log-likelihood statistics
#> Log-lik nbpar df res. Deviance AIC AICc
#> -1.029e+02 4 71 9.996e+01 2.138e+02 2.143e+02
# phi fixed to zero in group TREAT
betabin(cbind(y, n - y) ~ group, ~ group, dja,
fixpar = list(4, 0))
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja,
#> fixpar = list(4, 0))
#>
#> Convergence was obtained after 85 iterations.
#>
#> Fixed-effect coefficients:
#> Estimate Std. Error z value Pr(> |z|)
#> (Intercept) -6.081e-01 2.249e-01 -2.704e+00 6.860e-03
#> groupTREAT -8.425e-01 2.644e-01 -3.187e+00 1.439e-03
#>
#> Overdispersion coefficients:
#> Estimate Std. Error z value Pr(> z)
#> phi.groupCTRL 1.624e-01 7.374e-02 2.203e+00 1.381e-02
#>
#> Overdispersion coefficients set to fixed values:
#> Value
#> phi.groupTREAT 0
#>
#> Log-likelihood statistics
#> Log-lik nbpar df res. Deviance AIC AICc
#> -1.029e+02 3 72 9.996e+01 2.118e+02 2.121e+02
# glim without overdispersion
summary(glm(cbind(y, n - y) ~ group,
family = binomial, data = dja))
#>
#> Call:
#> glm(formula = cbind(y, n - y) ~ group, family = binomial, data = dja)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.5217 0.1477 -3.531 0.000414 ***
#> groupTREAT -0.9289 0.2028 -4.581 4.63e-06 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 136.68 on 74 degrees of freedom
#> Residual deviance: 115.57 on 73 degrees of freedom
#> AIC: 225.38
#>
#> Number of Fisher Scoring iterations: 4
#>
# phi fixed to zero in both groups
betabin(cbind(y, n - y) ~ group, ~ group, dja,
fixpar = list(c(3, 4), c(0, 0)))
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja,
#> fixpar = list(c(3, 4), c(0, 0)))
#>
#> Convergence was obtained after 41 iterations.
#>
#> Fixed-effect coefficients:
#> Estimate Std. Error z value Pr(> |z|)
#> (Intercept) -5.217e-01 1.477e-01 -3.531e+00 4.136e-04
#> groupTREAT -9.289e-01 2.028e-01 -4.581e+00 4.632e-06
#>
#> Overdispersion coefficients set to fixed values:
#> Value
#> phi.groupCTRL 0
#> phi.groupTREAT 0
#>
#> Log-likelihood statistics
#> Log-lik nbpar df res. Deviance AIC AICc
#> -1.107e+02 2 73 1.156e+02 2.254e+02 2.256e+02