Zero-Inflated Negative Binomial Distribution Family Function

Fits a zero-inflated negative binomial distribution by full maximum likelihood estimation.

zinegbinomial(zero = "size",
              type.fitted = c("mean", "munb", "pobs0", "pstr0",
              "onempstr0"),
              mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
              eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
              lpstr0 = "logitlink", lmunb = "loglink", lsize = "loglink",
              imethod = 1, ipstr0 = NULL, imunb =  NULL,
              iprobs.y = NULL, isize = NULL,
              gprobs.y = (0:9)/10,
              gsize.mux = exp(c(-30, -20, -15, -10, -6:3)))
zinegbinomialff(lmunb = "loglink", lsize = "loglink", lonempstr0 = "logitlink",
                type.fitted = c("mean", "munb", "pobs0", "pstr0",
                "onempstr0"), imunb = NULL, isize = NULL, ionempstr0 =
                NULL, zero = c("size", "onempstr0"), imethod = 1,
                iprobs.y = NULL, cutoff.prob = 0.999,
                eps.trig = 1e-7,  max.support = 4000, max.chunk.MB = 30,
                gprobs.y = (0:9)/10, gsize.mux = exp((-12:6)/2),
                mds.min = 1e-3, nsimEIM = 500)

Arguments

lpstr0, lmunb, lsize

Link functions for the parameters $\phi$, the mean and $k$; see negbinomial for details, and Links for more choices. For the zero-deflated model see below.

type.fitted

See CommonVGAMffArguments and fittedvlm for more information.

ipstr0, isize, imunb

Optional initial values for $\phi$ and $k$ and $\mu$. The default is to compute an initial value internally for both. If a vector then recycling is used.

lonempstr0, ionempstr0

Corresponding arguments for the other parameterization. See details below.

imethod

An integer with value 1 or 2 or 3 which specifies the initialization method for the mean parameter. If failure to converge occurs try another value. See CommonVGAMffArguments for more information.

zero

Specifies which linear/additive predictors are to be modelled as intercept-only. They can be such that their absolute values are either 1 or 2 or 3. The default is the $\phi$ and $k$ parameters (both for each response). See CommonVGAMffArguments for more information.

nsimEIM

See CommonVGAMffArguments for information.

iprobs.y, cutoff.prob, max.support, max.chunk.MB

See negbinomial and/or posnegbinomial for details.

mds.min, eps.trig

See negbinomial for details.

gprobs.y, gsize.mux

These arguments relate to grid searching in the initialization process. See negbinomial and/or posnegbinomial for details.

Details

These functions are based on $$P(Y=0) = \phi + (1-\phi) (k/(k+\mu))^k,$$ and for $y=1,2,\ldots$, $$P(Y=y) = (1-\phi) \, dnbinom(y, \mu, k).$$ The parameter $\phi$ satisfies $0 < \phi < 1$. The mean of $Y$ is $(1-\phi) \mu$ (returned as the fitted values). By default, the three linear/additive predictors for zinegbinomial() are $(logit(\phi), \log(\mu), \log(k))^T$. See negbinomial, another VGAM family function, for the formula of the probability density function and other details of the negative binomial distribution.

Independent multiple responses are handled. If so then arguments ipstr0 and isize may be vectors with length equal to the number of responses.

The VGAM family function zinegbinomialff() has a few changes compared to zinegbinomial(). These are: (i) the order of the linear/additive predictors is switched so the NB mean comes first; (ii) onempstr0 is now 1 minus the probability of a structural 0, i.e., the probability of the parent (NB) component, i.e., onempstr0 is 1-pstr0; (iii) argument zero has a new default so that the onempstr0 is intercept-only by default. Now zinegbinomialff() is generally recommended over zinegbinomial(). Both functions implement Fisher scoring and can handle multiple responses.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Author

T. W. Yee

Note

Estimated probabilities of a structural zero and an observed zero can be returned, as in zipoisson; see fittedvlm for more information.

If $k$ is large then the use of VGAM family function zipoisson is probably preferable. This follows because the Poisson is the limiting distribution of a negative binomial as $k$ tends to infinity.

The zero-deflated negative binomial distribution might be fitted by setting lpstr0 = identitylink, albeit, not entirely reliably. See zipoisson for information that can be applied here. Else try the zero-altered negative binomial distribution (see zanegbinomial).

Warning

This model can be difficult to fit to data, and this family function is fragile. The model is especially difficult to fit reliably when the estimated $k$ parameter is very large (so the model approaches a zero-inflated Poisson distribution) or much less than 1 (and gets more difficult as it approaches 0). Numerical problems can also occur, e.g., when the probability of a zero is actually less than, and not more than, the nominal probability of zero. Similarly, numerical problems can occur if there is little or no 0-inflation, or when the sample size is small. Half-stepping is not uncommon. Successful convergence is sensitive to the initial values, therefore if failure to converge occurs, try using combinations of arguments stepsize (in vglm.control), imethod, imunb, ipstr0, isize, and/or zero if there are explanatory variables. Else try fitting an ordinary negbinomial model or a zipoisson model.

This VGAM family function can be computationally expensive and can run slowly; setting trace = TRUE is useful for monitoring convergence.

Examples

if (FALSE) { # \dontrun{
# Example 1
ndata <- data.frame(x2 = runif(nn <- 1000))
ndata <- transform(ndata, pstr0 = logitlink(-0.5 + 1 * x2, inverse = TRUE),
                          munb  =   exp( 3   + 1 * x2),
                          size  =   exp( 0   + 2 * x2))
ndata <- transform(ndata,
                   y1 = rzinegbin(nn, mu = munb, size = size, pstr0 = pstr0))
with(ndata, table(y1)["0"] / sum(table(y1)))
nfit <- vglm(y1 ~ x2, zinegbinomial(zero = NULL), data = ndata)
coef(nfit, matrix = TRUE)
summary(nfit)
head(cbind(fitted(nfit), with(ndata, (1 - pstr0) * munb)))
round(vcov(nfit), 3)


# Example 2: RR-ZINB could also be called a COZIVGLM-ZINB-2
ndata <- data.frame(x2 = runif(nn <- 2000))
ndata <- transform(ndata, x3 = runif(nn))
ndata <- transform(ndata, eta1 =          3   + 1   * x2 + 2 * x3)
ndata <- transform(ndata, pstr0  = logitlink(-1.5 + 0.5 * eta1, inverse = TRUE),
                          munb = exp(eta1),
                          size = exp(4))
ndata <- transform(ndata,
                   y1 = rzinegbin(nn, pstr0 = pstr0, mu = munb, size = size))
with(ndata, table(y1)["0"] / sum(table(y1)))
rrzinb <- rrvglm(y1 ~ x2 + x3, zinegbinomial(zero = NULL), data = ndata,
                 Index.corner = 2, str0 = 3, trace = TRUE)
coef(rrzinb, matrix = TRUE)
Coef(rrzinb)
} # }