Fits a beta-binomial generalized linear model accounting for overdispersion in clustered binomial data \((n, y)\).

betabin(formula, random, data, link = c("logit", "cloglog"), phi.ini = NULL,
          warnings = FALSE, na.action = na.omit, fixpar = list(),
          hessian = TRUE, control = list(maxit = 2000), ...)

Arguments

formula

A formula for the fixed effects b. The left-hand side of the formula must be of the form cbind(y, n - y) where the modelled probability is y/n.

random

A right-hand formula for the overdispersion parameter(s) \(\phi\).

The link function for the mean \(p\): “logit” or “cloglog”.

data

A data frame containing the response (n and y) and explanatory variable(s).

phi.ini

Initial values for the overdispersion parameter(s) \(\phi\). Default to 0.1.

warnings

Logical to control the printing of warnings occurring during log-likelihood maximization. Default to FALSE (no printing).

na.action

A function name: which action should be taken in the case of missing value(s).

fixpar

A list with 2 components (scalars or vectors) of the same size, indicating which parameters are fixed (i.e., not optimized) in the global parameter vector \((b, \phi)\) and the corresponding fixed values.
For example, fixpar = list(c(4, 5), c(0, 0)) means that 4th and 5th parameters of the model are set to 0.

hessian

A logical. When set to FALSE, the hessian and the variances-covariances matrices of the parameters are not computed.

control

A list to control the optimization parameters. See optim. By default, set the maximum number of iterations to 2000.

...

Further arguments passed to optim.

Details

For a given cluster \((n, y)\), the model is: $$y~|~\lambda \sim Binomial(n,~\lambda)$$ with \(\lambda\) following a Beta distribution \(Beta(a1,~a2)\).
If \(B\) denotes the beta function, then: $$P(\lambda) = \frac{\lambda^{a1~-~1} * (1~-~\lambda)^{a2 - 1}}{B(a1,~a2)}$$ $$E[\lambda] = \frac{a1}{a1 + a2}$$ $$Var[\lambda] = \frac{a1 * a2}{(a1 + a2 + 1) * (a1 + a2)^2}$$ The marginal beta-binomial distribution is: $$P(y) = \frac{C(n,~y) * B(a1 + y, a2 + n - y)}{B(a1,~a2)}$$ The function uses the parameterization \(p = \frac{a1}{a1 + a2} = h(X b) = h(\eta)\) and \(\phi = \frac{1}{a1 + a2 + 1}\), where \(h\) is the inverse of the link function (logit or complementary log-log), \(X\) is a design-matrix, \(b\) is a vector of fixed effects, \(\eta = X b\) is the linear predictor and \(\phi\) is the overdispersion parameter (i.e., the intracluster correlation coefficient, which is here restricted to be positive).
The marginal mean and variance are: $$E[y] = n * p$$ $$Var[y] = n * p * (1 - p) * [1 + (n - 1) * \phi]$$ The parameters \(b\) and \(\phi\) are estimated by maximizing the log-likelihood of the marginal model (using the function optim). Several explanatory variables are allowed in \(b\), only one in \(\phi\).

Value

An object of formal class “glimML”: see glimML-class for details.

References

Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
Griffiths, D.A., 1973. Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of disease. Biometrics 29, 637-648.
Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. J.A.S.A. 81, 321-327.
Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-952.

Author

Matthieu Lesnoff matthieu.lesnoff@cirad.fr, Renaud Lancelot renaud.lancelot@cirad.fr

See also

Examples

  data(orob2)
  fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2)
  fm2 <- betabin(cbind(y, n - y) ~ seed + root, ~ 1, data = orob2)
  fm3 <- betabin(cbind(y, n - y) ~ seed * root, ~ 1, data = orob2)
  # show the model
  fm1; fm2; fm3
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed, random = ~1, data = orob2)
#> 
#> Convergence was obtained after 80 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error   z value Pr(> |z|)
#> (Intercept) -2.602e-01  2.262e-01 -1.15e+00 2.500e-01
#> seedO75      4.130e-01  2.993e-01  1.38e+00 1.676e-01
#> 
#> Overdispersion coefficients:
#>                  Estimate Std. Error   z value  Pr(> z)
#> phi.(Intercept) 7.805e-02  3.015e-02 2.589e+00 4.82e-03
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -6.355e+01          3         18  5.051e+01  1.331e+02  1.345e+02 
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed + root, random = ~1, 
#>     data = orob2)
#> 
#> Convergence was obtained after 167 iterations.
#> 
#> Fixed-effect coefficients:
#>                Estimate Std. Error    z value Pr(> |z|)
#> (Intercept)  -7.280e-01  1.995e-01 -3.649e+00 2.634e-04
#> seedO75       3.427e-01  2.106e-01  1.627e+00 1.038e-01
#> rootCUCUMBER  1.011e+00  2.020e-01  5.004e+00 5.625e-07
#> 
#> Overdispersion coefficients:
#>                  Estimate Std. Error   z value   Pr(> z)
#> phi.(Intercept) 1.937e-02  1.416e-02 1.368e+00 8.569e-02
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -5.583e+01          4         17  3.507e+01  1.197e+02  1.222e+02 
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ seed * root, random = ~1, 
#>     data = orob2)
#> 
#> Convergence was obtained after 196 iterations.
#> 
#> Fixed-effect coefficients:
#>                        Estimate Std. Error    z value Pr(> |z|)
#> (Intercept)          -4.456e-01  2.183e-01 -2.041e+00 4.124e-02
#> seedO75              -9.612e-02  2.737e-01 -3.512e-01 7.255e-01
#> rootCUCUMBER          5.235e-01  2.968e-01  1.764e+00 7.780e-02
#> seedO75:rootCUCUMBER  7.962e-01  3.779e-01  2.107e+00 3.514e-02
#> 
#> Overdispersion coefficients:
#>                  Estimate Std. Error   z value   Pr(> z)
#> phi.(Intercept) 1.236e-02  1.131e-02 1.093e+00 1.373e-01
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -5.377e+01          5         16  3.094e+01  1.175e+02  1.215e+02 
  # AIC
  AIC(fm1, fm2, fm3)
#>     df      AIC     AICc
#> fm1  3 133.1054 134.5172
#> fm2  4 119.6637 122.1637
#> fm3  5 117.5336 121.5336
  summary(AIC(fm1, fm2, fm3), which = "AICc")
#>     df     AICc      Delta            W     Cum.W
#> fm3  5 121.5336  0.0000000 0.5776126568 0.5776127
#> fm2  4 122.1637  0.6301115 0.4215117935 0.9991245
#> fm1  3 134.5172 12.9836137 0.0008755497 1.0000000
  # Wald test for root effect
  wald.test(b = coef(fm3), Sigma = vcov(fm3), Terms = 3:4)
#> Wald test:
#> ----------
#> 
#> Chi-squared test:
#> X2 = 34.9, df = 2, P(> X2) = 2.6e-08
  # likelihood ratio test for root effect
  anova(fm1, fm3)
#> Analysis of Deviance Table (beta-binomial models)
#> 
#> fm1: fixed = cbind(y, n - y) ~ seed; random = ~1 
#> fm3: fixed = cbind(y, n - y) ~ seed * root; random = ~1 
#> 
#>       logL k   AIC  AICc   BIC Resid. dev. Resid. Df    Test Deviance Df
#> fm1 -63.55 3 133.1 134.5 136.2       50.51        18                    
#> fm3 -53.77 5 117.5 121.5 122.8       30.94        16 fm1-fm3    19.57  2
#>     P(> Chi2)
#> fm1          
#> fm3 5.624e-05
  # model predictions
  New <- expand.grid(seed = levels(orob2$seed),
                     root = levels(orob2$root))
  data.frame(New, predict(fm3, New, se = TRUE, type = "response"))
#>   seed     root       fit     se.fit
#> 1  O73     BEAN 0.3904148 0.05195385
#> 2  O75     BEAN 0.3677958 0.03808118
#> 3  O73 CUCUMBER 0.5194646 0.05118758
#> 4  O75 CUCUMBER 0.6852506 0.03616440
  # Djallonke sheep data
  data(dja)
  betabin(cbind(y, n - y) ~ group, ~ 1, dja)
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~1, data = dja)
#> 
#> Convergence was obtained after 68 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error    z value Pr(> |z|)
#> (Intercept) -5.846e-01  1.861e-01 -3.141e+00 1.685e-03
#> groupTREAT  -8.522e-01  2.527e-01 -3.372e+00 7.471e-04
#> 
#> Overdispersion coefficients:
#>                  Estimate Std. Error   z value   Pr(> z)
#> phi.(Intercept) 5.602e-02  3.013e-02 1.859e+00 3.149e-02
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -1.071e+02          3         72  1.084e+02  2.202e+02  2.205e+02 
  # heterogeneous phi
  betabin(cbind(y, n - y) ~ group, ~ group, dja,
          control = list(maxit = 1000))
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja, 
#>     control = list(maxit = 1000))
#> 
#> Convergence was obtained after 567 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error    z value Pr(> |z|)
#> (Intercept) -6.076e-01  2.254e-01 -2.695e+00 7.028e-03
#> groupTREAT  -8.425e-01  2.648e-01 -3.181e+00 1.465e-03
#> 
#> Overdispersion coefficients:
#>                 Estimate Std. Error   z value   Pr(> z)
#> phi.groupCTRL  1.642e-01  7.443e-02 2.206e+00 1.369e-02
#> phi.groupTREAT 4.249e-09  2.000e-13 0.000e+00 1.000e+00
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -1.029e+02          4         71  9.996e+01  2.138e+02  2.143e+02 
  # phi fixed to zero in group TREAT
   betabin(cbind(y, n - y) ~ group, ~ group, dja,
    fixpar = list(4, 0))
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja, 
#>     fixpar = list(4, 0))
#> 
#> Convergence was obtained after 85 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error    z value Pr(> |z|)
#> (Intercept) -6.081e-01  2.249e-01 -2.704e+00 6.860e-03
#> groupTREAT  -8.425e-01  2.644e-01 -3.187e+00 1.439e-03
#> 
#> Overdispersion coefficients:
#>                Estimate Std. Error   z value   Pr(> z)
#> phi.groupCTRL 1.624e-01  7.374e-02 2.203e+00 1.381e-02
#> 
#> Overdispersion coefficients set to fixed values:
#>                Value
#> phi.groupTREAT     0
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -1.029e+02          3         72  9.996e+01  2.118e+02  2.121e+02 
  # glim without overdispersion
  summary(glm(cbind(y, n - y) ~ group,
    family = binomial, data = dja))
#> 
#> Call:
#> glm(formula = cbind(y, n - y) ~ group, family = binomial, data = dja)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.5217     0.1477  -3.531 0.000414 ***
#> groupTREAT   -0.9289     0.2028  -4.581 4.63e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 136.68  on 74  degrees of freedom
#> Residual deviance: 115.57  on 73  degrees of freedom
#> AIC: 225.38
#> 
#> Number of Fisher Scoring iterations: 4
#> 
  # phi fixed to zero in both groups
  betabin(cbind(y, n - y) ~ group, ~ group, dja,
    fixpar = list(c(3, 4), c(0, 0))) 
#> Beta-binomial model
#> -------------------
#> betabin(formula = cbind(y, n - y) ~ group, random = ~group, data = dja, 
#>     fixpar = list(c(3, 4), c(0, 0)))
#> 
#> Convergence was obtained after 41 iterations.
#> 
#> Fixed-effect coefficients:
#>               Estimate Std. Error    z value Pr(> |z|)
#> (Intercept) -5.217e-01  1.477e-01 -3.531e+00 4.136e-04
#> groupTREAT  -9.289e-01  2.028e-01 -4.581e+00 4.632e-06
#> 
#> Overdispersion coefficients set to fixed values:
#>                Value
#> phi.groupCTRL      0
#> phi.groupTREAT     0
#> 
#> Log-likelihood statistics
#>    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
#> -1.107e+02          2         73  1.156e+02  2.254e+02  2.256e+02