Zero-Altered Binomial Distribution

Fits a zero-altered binomial distribution based on a conditional model involving a Bernoulli distribution and a positive-binomial distribution.

zabinomial(lpobs0 = "logitlink", lprob = "logitlink",
     type.fitted = c("mean", "prob", "pobs0"),
     ipobs0 = NULL, iprob = NULL, imethod = 1, zero = NULL)
zabinomialff(lprob = "logitlink", lonempobs0 = "logitlink",
     type.fitted = c("mean", "prob", "pobs0", "onempobs0"),
     iprob = NULL, ionempobs0 = NULL, imethod = 1, zero = "onempobs0")

Arguments

lprob

Parameter link function applied to the probability parameter of the binomial distribution. See Links for more choices.

lpobs0

Link function for the parameter $p_0$, called pobs0 here. See Links for more choices.

type.fitted

See CommonVGAMffArguments and fittedvlm for information.

iprob, ipobs0

See CommonVGAMffArguments.

lonempobs0, ionempobs0

Corresponding argument for the other parameterization. See details below.

imethod, zero

See CommonVGAMffArguments.

Details

The response $Y$ is zero with probability $p_0$, else $Y$ has a positive-binomial distribution with probability $1-p_0$. Thus $0 < p_0 < 1$, which may be modelled as a function of the covariates. The zero-altered binomial distribution differs from the zero-inflated binomial distribution in that the former has zeros coming from one source, whereas the latter has zeros coming from the binomial distribution too. The zero-inflated binomial distribution is implemented in zibinomial. Some people call the zero-altered binomial a hurdle model.

The input is currently a vector or one-column matrix. By default, the two linear/additive predictors for zabinomial() are $(logit(p_0), \log(p))^T$.

The VGAM family function zabinomialff() has a few changes compared to zabinomial(). These are: (i) the order of the linear/additive predictors is switched so the binomial probability comes first; (ii) argument onempobs0 is now 1 minus the probability of an observed 0, i.e., the probability of the positive binomial distribution, i.e., onempobs0 is 1-pobs0; (iii) argument zero has a new default so that the onempobs0 is intercept-only by default. Now zabinomialff() is generally recommended over zabinomial(). Both functions implement Fisher scoring and neither can handle multiple responses.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

The fitted.values slot of the fitted object, which should be extracted by the generic function fitted, returns the mean $\mu$ (default) which is given by $$\mu = (1-p_0) \mu_{b} / [1 - (1 - \mu_{b})^N]$$ where $\mu_{b}$ is the usual binomial mean. If type.fitted = "pobs0" then $p_0$ is returned.

Author

T. W. Yee

Note

The response should be a two-column matrix of counts, with first column giving the number of successes.

Note this family function allows $p_0$ to be modelled as functions of the covariates by having zero = NULL. It is a conditional model, not a mixture model.

These family functions effectively combine posbinomial and binomialff into one family function.

Examples

zdata <- data.frame(x2 = runif(nn <- 1000))
zdata <- transform(zdata, size  = 10,
                          prob  = logitlink(-2 + 3*x2, inverse = TRUE),
                          pobs0 = logitlink(-1 + 2*x2, inverse = TRUE))
zdata <- transform(zdata,
                   y1 = rzabinom(nn, size = size, prob = prob, pobs0 = pobs0))
with(zdata, table(y1))
#> y1
#>   0   1   2   3   4   5   6   7   8   9 
#> 508  94 108  79  69  51  31  30  23   7 

zfit <- vglm(cbind(y1, size - y1) ~ x2, zabinomial(zero = NULL),
             data = zdata, trace = TRUE)
#> Iteration 1: loglikelihood = -1536.577
#> Iteration 2: loglikelihood = -1456.4377
#> Iteration 3: loglikelihood = -1455.6176
#> Iteration 4: loglikelihood = -1455.6158
#> Iteration 5: loglikelihood = -1455.6158
coef(zfit, matrix = TRUE)
#>             logitlink(pobs0) logitlink(prob)
#> (Intercept)       -0.8885453       -2.099015
#> x2                 1.8716549        3.123550
head(fitted(zfit))
#>        [,1]
#> 1 0.2058659
#> 2 0.1166219
#> 3 0.1675483
#> 4 0.2056937
#> 5 0.1795679
#> 6 0.2017297
head(predict(zfit))
#>   logitlink(pobs0) logitlink(prob)
#> 1       0.87234171      0.83967738
#> 2      -0.80313655     -1.95647922
#> 3      -0.11118247     -0.80169731
#> 4       0.39369692      0.04088111
#> 5       0.02097625     -0.58114149
#> 6       0.31517222     -0.09016647
summary(zfit)
#> 
#> Call:
#> vglm(formula = cbind(y1, size - y1) ~ x2, family = zabinomial(zero = NULL), 
#>     data = zdata, trace = TRUE)
#> 
#> Coefficients: 
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept):1 -0.88855    0.13273  -6.695 2.16e-11 ***
#> (Intercept):2 -2.09902    0.07711 -27.222  < 2e-16 ***
#> x2:1           1.87165    0.23566   7.942 1.98e-15 ***
#> x2:2           3.12355    0.13564  23.027  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Names of linear predictors: logitlink(pobs0), logitlink(prob)
#> 
#> Log-likelihood: -1455.616 on 1996 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 5 
#> 
#> No Hauck-Donner effect found in any of the estimates
#>