Beta-Binomial Model for Proportions

Fits a beta-binomial generalized linear model accounting for overdispersion in clustered binomial data $(n, y)$.

betabin(formula, random, data, link = c("logit", "cloglog"), phi.ini = NULL,
          warnings = FALSE, na.action = na.omit, fixpar = list(),
          hessian = TRUE, control = list(maxit = 2000), ...)

Arguments

formula: A formula for the fixed effects b. The left-hand side of the formula must be of the form cbind(y, n - y) where the modelled probability is y/n.
random: A right-hand formula for the overdispersion parameter(s) $\phi$.
link: The link function for the mean $p$: “logit” or “cloglog”.
data: A data frame containing the response (n and y) and explanatory variable(s).
phi.ini: Initial values for the overdispersion parameter(s) $\phi$. Default to 0.1.
warnings: Logical to control the printing of warnings occurring during log-likelihood maximization. Default to FALSE (no printing).
na.action: A function name: which action should be taken in the case of missing value(s).
fixpar: A list with 2 components (scalars or vectors) of the same size, indicating which parameters are fixed (i.e., not optimized) in the global parameter vector $(b, \phi)$ and the corresponding fixed values.
For example, fixpar = list(c(4, 5), c(0, 0)) means that 4th and 5th parameters of the model are set to 0.
hessian: A logical. When set to FALSE, the hessian and the variances-covariances matrices of the parameters are not computed.
control: A list to control the optimization parameters. See optim. By default, set the maximum number of iterations to 2000.
...: Further arguments passed to optim.

Details

For a given cluster $(n, y)$, the model is: $$y~|~\lambda \sim Binomial(n,~\lambda)$$ with $\lambda$ following a Beta distribution $Beta(a1,~a2)$.
If $B$ denotes the beta function, then: $$P(\lambda) = \frac{\lambda^{a1~-~1} * (1~-~\lambda)^{a2 - 1}}{B(a1,~a2)}$$ $$E[\lambda] = \frac{a1}{a1 + a2}$$ $$Var[\lambda] = \frac{a1 * a2}{(a1 + a2 + 1) * (a1 + a2)^2}$$ The marginal beta-binomial distribution is: $$P(y) = \frac{C(n,~y) * B(a1 + y, a2 + n - y)}{B(a1,~a2)}$$ The function uses the parameterization $p = \frac{a1}{a1 + a2} = h(X b) = h(\eta)$ and $\phi = \frac{1}{a1 + a2 + 1}$, where $h$ is the inverse of the link function (logit or complementary log-log), $X$ is a design-matrix, $b$ is a vector of fixed effects, $\eta = X b$ is the linear predictor and $\phi$ is the overdispersion parameter (i.e., the intracluster correlation coefficient, which is here restricted to be positive).
The marginal mean and variance are: $$E[y] = n * p$$ $$Var[y] = n * p * (1 - p) * [1 + (n - 1) * \phi]$$ The parameters $b$ and $\phi$ are estimated by maximizing the log-likelihood of the marginal model (using the function optim). Several explanatory variables are allowed in $b$, only one in $\phi$.

Value

An object of formal class “glimML”: see glimML-class for details.

References

Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
Griffiths, D.A., 1973. Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of disease. Biometrics 29, 637-648.
Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. J.A.S.A. 81, 321-327.
Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-952.

Author

Matthieu Lesnoff matthieu.lesnoff@cirad.fr, Renaud Lancelot renaud.lancelot@cirad.fr

Examples

  data(orob2)
  fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2)
  fm2 <- betabin(cbind(y, n - y) ~ seed + root, ~ 1, data = orob2)
  fm3 <- betabin(cbind(y, n - y) ~ seed * root, ~ 1, data = orob2)
  # show the model
  fm1; fm2; fm3
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed, random = ~1, data = orob2)
#> 
#> Convergence was obtained after 80 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error   z value Pr(> |z|)
#> (Intercept) -2.602e-01  2.262e-01 -1.15e+00 2.500e-01
#> seedO75      4.130e-01  2.993e-01  1.38e+00 1.676e-01
#> 
#> Overdispersion coefficients:
#>                  Estimate Std. Error   z value  Pr(> z)
#> phi.(Intercept) 7.805e-02  3.015e-02 2.589e+00 4.82e-03
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -6.355e+01          3         18  5.051e+01  1.331e+02  1.345e+02 
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed + root, random = ~1, 
#>     data = orob2)
#> 
#> Convergence was obtained after 167 iterations.
#> 
#> Fixed-effect coefficients:
#>                Estimate Std. Error    z value Pr(> |z|)
#> (Intercept)  -7.280e-01  1.995e-01 -3.649e+00 2.634e-04
#> seedO75       3.427e-01  2.106e-01  1.627e+00 1.038e-01
#> rootCUCUMBER  1.011e+00  2.020e-01  5.004e+00 5.625e-07
#> 
#> Overdispersion coefficients:
#>                  Estimate Std. Error   z value   Pr(> z)
#> phi.(Intercept) 1.937e-02  1.416e-02 1.368e+00 8.569e-02
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -5.583e+01          4         17  3.507e+01  1.197e+02  1.222e+02 
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed * root, random = ~1, 
#>     data = orob2)
#> 
#> Convergence was obtained after 196 iterations.
#> 
#> Fixed-effect coefficients:
#>                        Estimate Std. Error    z value Pr(> |z|)
#> (Intercept)          -4.456e-01  2.183e-01 -2.041e+00 4.124e-02
#> seedO75              -9.612e-02  2.737e-01 -3.512e-01 7.255e-01
#> rootCUCUMBER          5.235e-01  2.968e-01  1.764e+00 7.780e-02
#> seedO75:rootCUCUMBER  7.962e-01  3.779e-01  2.107e+00 3.514e-02
#> 
#> Overdispersion coefficients:
#>                  Estimate Std. Error   z value   Pr(> z)
#> phi.(Intercept) 1.236e-02  1.131e-02 1.093e+00 1.373e-01
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -5.377e+01          5         16  3.094e+01  1.175e+02  1.215e+02 
  # AIC
  AIC(fm1, fm2, fm3)
#>     df      AIC     AICc
#> fm1  3 133.1054 134.5172
#> fm2  4 119.6637 122.1637
#> fm3  5 117.5336 121.5336
  summary(AIC(fm1, fm2, fm3), which = "AICc")
#>     df     AICc      Delta            W     Cum.W
#> fm3  5 121.5336  0.0000000 0.5776126568 0.5776127
#> fm2  4 122.1637  0.6301115 0.4215117935 0.9991245
#> fm1  3 134.5172 12.9836137 0.0008755497 1.0000000
  # Wald test for root effect
  wald.test(b = coef(fm3), Sigma = vcov(fm3), Terms = 3:4)
#> Wald test:
#> ----------
#> 
#> Chi-squared test:
#> X2 = 34.9, df = 2, P(> X2) = 2.6e-08
  # likelihood ratio test for root effect
  anova(fm1, fm3)
#> Analysis of Deviance Table (beta-binomial models)
#> 
#> fm1: fixed = cbind(y, n - y) ~ seed; random = ~1 
#> fm3: fixed = cbind(y, n - y) ~ seed * root; random = ~1 
#> 
#>       logL k   AIC  AICc   BIC Resid. dev. Resid. Df    Test Deviance Df
#> fm1 -63.55 3 133.1 134.5 136.2       50.51        18                    
#> fm3 -53.77 5 117.5 121.5 122.8       30.94        16 fm1-fm3    19.57  2
#>     P(> Chi2)
#> fm1          
#> fm3 5.624e-05
  # model predictions
  New <- expand.grid(seed = levels(orob2$seed),
                     root = levels(orob2$root))
  data.frame(New, predict(fm3, New, se = TRUE, type = "response"))
#>   seed     root       fit     se.fit
#> 1  O73     BEAN 0.3904148 0.05195385
#> 2  O75     BEAN 0.3677958 0.03808118
#> 3  O73 CUCUMBER 0.5194646 0.05118758
#> 4  O75 CUCUMBER 0.6852506 0.03616440
  # Djallonke sheep data
  data(dja)
  betabin(cbind(y, n - y) ~ group, ~ 1, dja)
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~1, data = dja)
#> 
#> Convergence was obtained after 68 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error    z value Pr(> |z|)
#> (Intercept) -5.846e-01  1.861e-01 -3.141e+00 1.685e-03
#> groupTREAT  -8.522e-01  2.527e-01 -3.372e+00 7.471e-04
#> 
#> Overdispersion coefficients:
#>                  Estimate Std. Error   z value   Pr(> z)
#> phi.(Intercept) 5.602e-02  3.013e-02 1.859e+00 3.149e-02
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -1.071e+02          3         72  1.084e+02  2.202e+02  2.205e+02 
  # heterogeneous phi
  betabin(cbind(y, n - y) ~ group, ~ group, dja,
          control = list(maxit = 1000))
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja, 
#>     control = list(maxit = 1000))
#> 
#> Convergence was obtained after 567 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error    z value Pr(> |z|)
#> (Intercept) -6.076e-01  2.254e-01 -2.695e+00 7.028e-03
#> groupTREAT  -8.425e-01  2.648e-01 -3.181e+00 1.465e-03
#> 
#> Overdispersion coefficients:
#>                 Estimate Std. Error   z value   Pr(> z)
#> phi.groupCTRL  1.642e-01  7.443e-02 2.206e+00 1.369e-02
#> phi.groupTREAT 4.249e-09  2.000e-13 0.000e+00 1.000e+00
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -1.029e+02          4         71  9.996e+01  2.138e+02  2.143e+02 
  # phi fixed to zero in group TREAT
   betabin(cbind(y, n - y) ~ group, ~ group, dja,
    fixpar = list(4, 0))
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja, 
#>     fixpar = list(4, 0))
#> 
#> Convergence was obtained after 85 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error    z value Pr(> |z|)
#> (Intercept) -6.081e-01  2.249e-01 -2.704e+00 6.860e-03
#> groupTREAT  -8.425e-01  2.644e-01 -3.187e+00 1.439e-03
#> 
#> Overdispersion coefficients:
#>                Estimate Std. Error   z value   Pr(> z)
#> phi.groupCTRL 1.624e-01  7.374e-02 2.203e+00 1.381e-02
#> 
#> Overdispersion coefficients set to fixed values:
#>                Value
#> phi.groupTREAT     0
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -1.029e+02          3         72  9.996e+01  2.118e+02  2.121e+02 
  # glim without overdispersion
  summary(glm(cbind(y, n - y) ~ group,
    family = binomial, data = dja))
#> 
#> Call:
#> glm(formula = cbind(y, n - y) ~ group, family = binomial, data = dja)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.5217     0.1477  -3.531 0.000414 ***
#> groupTREAT   -0.9289     0.2028  -4.581 4.63e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 136.68  on 74  degrees of freedom
#> Residual deviance: 115.57  on 73  degrees of freedom
#> AIC: 225.38
#> 
#> Number of Fisher Scoring iterations: 4
#> 
  # phi fixed to zero in both groups
  betabin(cbind(y, n - y) ~ group, ~ group, dja,
    fixpar = list(c(3, 4), c(0, 0))) 
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja, 
#>     fixpar = list(c(3, 4), c(0, 0)))
#> 
#> Convergence was obtained after 41 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error    z value Pr(> |z|)
#> (Intercept) -5.217e-01  1.477e-01 -3.531e+00 4.136e-04
#> groupTREAT  -9.289e-01  2.028e-01 -4.581e+00 4.632e-06
#> 
#> Overdispersion coefficients set to fixed values:
#>                Value
#> phi.groupCTRL      0
#> phi.groupTREAT     0
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -1.107e+02          2         73  1.156e+02  2.254e+02  2.256e+02