Zero-Inflated Poisson Distribution Family Function

Fits a zero-inflated or zero-deflated Poisson distribution by full maximum likelihood estimation.

zipoisson(lpstr0 = "logitlink", llambda = "loglink", type.fitted =
   c("mean", "lambda", "pobs0", "pstr0", "onempstr0"),
   ipstr0 = NULL, ilambda = NULL, gpstr0 = NULL, imethod = 1,
   ishrinkage = 0.95, probs.y = 0.35, parallel = FALSE, zero = NULL)
zipoissonff(llambda = "loglink", lonempstr0 = "logitlink",
  type.fitted = c("mean", "lambda", "pobs0", "pstr0", "onempstr0"),
  ilambda = NULL, ionempstr0 = NULL, gonempstr0 = NULL,
  imethod = 1, ishrinkage = 0.95, probs.y = 0.35, zero = "onempstr0")

Arguments

lpstr0, llambda

Link function for the parameter $\phi$ and the usual $\lambda$ parameter. See Links for more choices; see CommonVGAMffArguments for more information. For the zero-deflated model see below.

ipstr0, ilambda

Optional initial values for $\phi$, whose values must lie between 0 and 1. Optional initial values for $\lambda$, whose values must be positive. The defaults are to compute an initial value internally for each. If a vector then recycling is used.

lonempstr0, ionempstr0

Corresponding arguments for the other parameterization. See details below.

type.fitted

Character. The type of fitted value to be returned. The first choice (the expected value) is the default. The estimated probability of an observed 0 is an alternative, else the estimated probability of a structural 0, or one minus the estimated probability of a structural 0. See CommonVGAMffArguments and fittedvlm for more information.

imethod

An integer with value 1 or 2 which specifies the initialization method for $\lambda$. If failure to converge occurs try another value and/or else specify a value for ishrinkage and/or else specify a value for ipstr0. See CommonVGAMffArguments for more information.

ishrinkage

How much shrinkage is used when initializing $\lambda$. The value must be between 0 and 1 inclusive, and a value of 0 means the individual response values are used, and a value of 1 means the median or mean is used. This argument is used in conjunction with imethod. See CommonVGAMffArguments for more information.

zero

Specifies which linear/additive predictors are to be modelled as intercept-only. If given, the value can be either 1 or 2, and the default is none of them. Setting zero = 1 makes $\phi$ a single parameter. See CommonVGAMffArguments for more information.

gpstr0, gonempstr0, probs.y

Details at CommonVGAMffArguments.

parallel

Details at CommonVGAMffArguments, but unlikely to be practically used actually.

Details

These models are a mixture of a Poisson distribution and the value 0; it has value 0 with probability $\phi$ else is Poisson($\lambda$) distributed. Thus there are two sources for zero values, and $\phi$ is the probability of a structural zero. The model for zipoisson() can be written $$P(Y = 0) = \phi + (1-\phi) \exp(-\lambda),$$ and for $y=1,2,\ldots$, $$P(Y = y) = (1-\phi) \exp(-\lambda) \lambda^y / y!.$$ Here, the parameter $\phi$ satisfies $0 < \phi < 1$. The mean of $Y$ is $(1-\phi) \lambda$ and these are returned as the fitted values, by default. The variance of $Y$ is $(1-\phi) \lambda (1 + \phi \lambda)$. By default, the two linear/additive predictors of zipoisson() are $(logit(\phi), \log(\lambda))^T$.

The VGAM family function zipoissonff() has a few changes compared to zipoisson(). These are: (i) the order of the linear/additive predictors is switched so the Poisson mean comes first; (ii) onempstr0 is now 1 minus the probability of a structural 0, i.e., the probability of the parent (Poisson) component, i.e., onempstr0 is 1-pstr0; (iii) argument zero has a new default so that the onempstr0 is intercept-only by default. Now zipoissonff() is generally recommended over zipoisson() (and definitely recommended over yip88). Both functions implement Fisher scoring and can handle multiple responses.

Both family functions can fit the zero-modified Poisson (ZMP), which is a combination of the ZIP and zero-deflated Poisson (ZDP); see Zipois for some details and the example below. The key is to set the link function to be identitylink. However, problems might occur when iterations get close to or go past the boundary of the parameter space, especially when there are covariates. The PMF of the ZMP is best written not as above but in terms of onempstr0 which may be greater than unity; when using pstr0 the above PMF is negative for non-zero values.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

References

Thas, O. and Rayner, J. C. W. (2005). Smooth tests for the zero-inflated Poisson distribution. Biometrics, 61, 808–815.

Data: Angers, J-F. and Biswas, A. (2003). A Bayesian analysis of zero-inflated generalized Poisson model. Computational Statistics & Data Analysis, 42, 37–46.

Cameron, A. C. and Trivedi, P. K. (1998). Regression Analysis of Count Data. Cambridge University Press: Cambridge.

M'Kendrick, A. G. (1925). Applications of mathematics to medical problems. Proc. Edinb. Math. Soc., 44, 98–130.

Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889–902.

Author

T. W. Yee

Note

This family function can be used to estimate the 0-deflated model, hence pstr0 is not to be interpreted as a probability. One should set, e.g., lpstr0 = "identitylink". Likewise, the functions in Zipois can handle the zero-deflated Poisson distribution too. Although the iterations might fall outside the parameter space, the validparams slot should keep them inside. A (somewhat) similar alternative for zero-deflation is to try the zero-altered Poisson model (see zapoisson).

The use of this VGAM family function with rrvglm can result in a so-called COZIGAM or COZIGLM. That is, a reduced-rank zero-inflated Poisson model (RR-ZIP) is a constrained zero-inflated generalized linear model. See what used to be COZIGAM on CRAN. A RR-ZINB model can also be fitted easily; see zinegbinomial. Jargon-wise, a COZIGLM might be better described as a COZIVGLM-ZIP.

Warning

Numerical problems can occur, e.g., when the probability of zero is actually less than, not more than, the nominal probability of zero. For example, in the Angers and Biswas (2003) data below, replacing 182 by 1 results in nonconvergence. Half-stepping is not uncommon. If failure to converge occurs, try using combinations of imethod, ishrinkage, ipstr0, and/or zipoisson(zero = 1) if there are explanatory variables. The default for zipoissonff() is to model the structural zero probability as an intercept-only.

Examples

if (FALSE) { # \dontrun{
# Example 1: simulated ZIP data
zdata <- data.frame(x2 = runif(nn <- 1000))
zdata <- transform(zdata,
           pstr01  = logitlink(-0.5 + 1*x2, inverse = TRUE),
           pstr02  = logitlink( 0.5 - 1*x2, inverse = TRUE),
           Ps01    = logitlink(-0.5       , inverse = TRUE),
           Ps02    = logitlink( 0.5       , inverse = TRUE),
           lambda1 =   loglink(-0.5 + 2*x2, inverse = TRUE),
           lambda2 =   loglink( 0.5 + 2*x2, inverse = TRUE))
zdata <- transform(zdata, y1 = rzipois(nn, lambda1, pstr0 = Ps01),
                          y2 = rzipois(nn, lambda2, pstr0 = Ps02))

with(zdata, table(y1))  # Eyeball the data
with(zdata, table(y2))
fit1 <- vglm(y1 ~ x2, zipoisson(zero = 1), zdata, crit = "coef")
fit2 <- vglm(y2 ~ x2, zipoisson(zero = 1), zdata, crit = "coef")
coef(fit1, matrix = TRUE)  # Should agree with the above values
coef(fit2, matrix = TRUE)  # Should agree with the above values

# Fit all two simultaneously, using a different parameterization:
fit12 <- vglm(cbind(y1, y2) ~ x2, zipoissonff, zdata, crit = "coef")
coef(fit12, matrix = TRUE)  # Should agree with the above values

# For the first observation compute the probability that y1 is
# due to a structural zero.
(fitted(fit1, type = "pstr0") / fitted(fit1, type = "pobs0"))[1]


# Example 2: McKendrick (1925). From 223 Indian village households
cholera <- data.frame(ncases = 0:4,  # Number of cholera cases,
                      wfreq  = c(168, 32, 16, 6, 1))  # Frequencies
fit <- vglm(ncases ~ 1, zipoisson, wei = wfreq, cholera)
coef(fit, matrix = TRUE)
with(cholera, cbind(actual = wfreq,
                    fitted = round(dzipois(ncases, Coef(fit)[2],
                                           pstr0 = Coef(fit)[1]) *
                                   sum(wfreq), digits = 2)))

# Example 3: data from Angers and Biswas (2003)
abdata <- data.frame(y = 0:7, w = c(182, 41, 12, 2, 2, 0, 0, 1))
abdata <- subset(abdata, w > 0)
fit3 <- vglm(y ~ 1, zipoisson(lpstr0 = probitlink, ipstr0 = 0.8),
             data = abdata, weight = w, trace = TRUE)
fitted(fit3, type = "pobs0")  # Estimate of P(Y = 0)
coef(fit3, matrix = TRUE)
Coef(fit3)  # Estimate of pstr0 and lambda
fitted(fit3)
with(abdata, weighted.mean(y, w))  # Compare this with fitted(fit)
summary(fit3)

# Example 4: zero-deflated (ZDP) model for intercept-only data
zdata <- transform(zdata, lambda3 = loglink(0.0, inverse = TRUE))
zdata <- transform(zdata, deflat.limit=-1/expm1(lambda3))  # Bndy
# The 'pstr0' parameter is negative and in parameter space:
# Not too near the boundary:
zdata <- transform(zdata, usepstr0 = deflat.limit / 2)
zdata <- transform(zdata,
                   y3 = rzipois(nn, lambda3, pstr0 = usepstr0))
head(zdata)
with(zdata, table(y3))  # A lot of deflation
fit4 <- vglm(y3 ~ 1, data = zdata, trace = TRUE, crit = "coef",
             zipoisson(lpstr0 = "identitylink"))
coef(fit4, matrix = TRUE)
# Check how accurate it was:
zdata[1, "usepstr0"]  # Answer
coef(fit4)[1]         # Estimate
Coef(fit4)
vcov(fit4)  # Is positive-definite

# Example 5: RR-ZIP
set.seed(123)
rrzip <- rrvglm(Alopacce ~ sm.bs(WaterCon, df = 3),
                zipoisson(zero = NULL),
                data = hspider, trace = TRUE, Index.corner = 2)
coef(rrzip, matrix = TRUE)
Coef(rrzip)
summary(rrzip)
plotvgam(rrzip, lcol = "blue")
} # }