Zero-Altered Geometric Distribution

Fits a zero-altered geometric distribution based on a conditional model involving a Bernoulli distribution and a positive-geometric distribution.

zageometric(lpobs0 = "logitlink", lprob = "logitlink",
     type.fitted = c("mean", "prob", "pobs0", "onempobs0"),
     imethod = 1, ipobs0 = NULL, iprob = NULL, zero = NULL)
zageometricff(lprob = "logitlink", lonempobs0 = "logitlink",
     type.fitted = c("mean", "prob", "pobs0", "onempobs0"),
     imethod = 1, iprob = NULL, ionempobs0 = NULL, zero = "onempobs0")

Arguments

lpobs0

Link function for the parameter $p_0$ or $\phi$, called pobs0 or phi here. See Links for more choices.

lprob

Parameter link function applied to the probability of success, called prob or $p$. See Links for more choices.

type.fitted

See CommonVGAMffArguments and fittedvlm for information.

ipobs0, iprob

Optional initial values for the parameters. If given, they must be in range. For multi-column responses, these are recycled sideways.

lonempobs0, ionempobs0

Corresponding argument for the other parameterization. See details below.

zero, imethod

See CommonVGAMffArguments.

Details

The response $Y$ is zero with probability $p_0$, or $Y$ has a positive-geometric distribution with probability $1-p_0$. Thus $0 < p_0 < 1$, which is modelled as a function of the covariates. The zero-altered geometric distribution differs from the zero-inflated geometric distribution in that the former has zeros coming from one source, whereas the latter has zeros coming from the geometric distribution too. The zero-inflated geometric distribution is implemented in the VGAM package. Some people call the zero-altered geometric a hurdle model.

The input can be a matrix (multiple responses). By default, the two linear/additive predictors of zageometric are $(logit(\phi), logit(p))^T$.

The VGAM family function zageometricff() has a few changes compared to zageometric(). These are: (i) the order of the linear/additive predictors is switched so the geometric probability comes first; (ii) argument onempobs0 is now 1 minus the probability of an observed 0, i.e., the probability of the positive geometric distribution, i.e., onempobs0 is 1-pobs0; (iii) argument zero has a new default so that the pobs0 is intercept-only by default. Now zageometricff() is generally recommended over zageometric(). Both functions implement Fisher scoring and can handle multiple responses.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

The fitted.values slot of the fitted object, which should be extracted by the generic function fitted, returns the mean $\mu$ (default) which is given by $$\mu = (1-\phi) / p.$$ If type.fitted = "pobs0" then $p_0$ is returned.

Warning

Convergence for this VGAM family function seems to depend quite strongly on providing good initial values.

Inference obtained from summary.vglm and summary.vgam may or may not be correct. In particular, the p-values, standard errors and degrees of freedom may need adjustment. Use simulation on artificial data to check that these are reasonable.

Author

T. W. Yee

Note

Note this family function allows $p_0$ to be modelled as functions of the covariates. It is a conditional model, not a mixture model.

This family function effectively combines binomialff and posgeometric() and geometric into one family function. However, posgeometric() is not written because it is trivially related to geometric.

Examples

zdata <- data.frame(x2 = runif(nn <- 1000))
zdata <- transform(zdata, pobs0 = logitlink(-1 + 2*x2, inverse = TRUE),
                          prob  = logitlink(-2 + 3*x2, inverse = TRUE))
zdata <- transform(zdata, y1 = rzageom(nn, prob = prob, pobs0 = pobs0),
                          y2 = rzageom(nn, prob = prob, pobs0 = pobs0))
with(zdata, table(y1))
#> y1
#>   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  16  17  18  19  20 
#> 546 159  88  64  30  20  27   5  12  12   3   8   1   6   5   2   1   3   1   2 
#>  21  23  24 
#>   1   2   2 

fit <- vglm(cbind(y1, y2) ~ x2, zageometric, data = zdata, trace = TRUE)
#> Iteration 1: loglikelihood = -3366.4351
#> Iteration 2: loglikelihood = -3249.489
#> Iteration 3: loglikelihood = -3224.5153
#> Iteration 4: loglikelihood = -3223.346
#> Iteration 5: loglikelihood = -3223.343
#> Iteration 6: loglikelihood = -3223.343
coef(fit, matrix = TRUE)
#>             logitlink(pobs01) logitlink(prob1) logitlink(pobs02)
#> (Intercept)        -0.7904093        -1.973762         -1.090994
#> x2                  1.9747830         2.992545          2.143130
#>             logitlink(prob2)
#> (Intercept)        -2.167299
#> x2                  3.354059
head(fitted(fit))
#>          y1        y2
#> 1 0.7261674 0.7948155
#> 2 3.0129591 3.6744562
#> 3 1.2055685 1.3627938
#> 4 2.5100170 3.0116937
#> 5 2.6577542 3.2051170
#> 6 0.4903941 0.5270491
head(predict(fit))
#>      logitlink(pobs01) logitlink(prob1) logitlink(pobs02) logitlink(prob2)
#> [1,]         0.5349262       0.03462333        0.34732427       0.08370952
#> [2,]        -0.4060872      -1.39136880       -0.67390880      -1.51454945
#> [3,]         0.1813044      -0.50124781       -0.03644316      -0.51689748
#> [4,]        -0.2923377      -1.21899514       -0.55046234      -1.32135220
#> [5,]        -0.3280844      -1.27316500       -0.58925641      -1.38206604
#> [6,]         0.8312928       0.48373110        0.66895557       0.58707171
summary(fit)
#> 
#> Call:
#> vglm(formula = cbind(y1, y2) ~ x2, family = zageometric, data = zdata, 
#>     trace = TRUE)
#> 
#> Coefficients: 
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept):1 -0.79041    0.13399  -5.899 3.66e-09 ***
#> (Intercept):2 -1.97376    0.09949 -19.839  < 2e-16 ***
#> (Intercept):3 -1.09099    0.13816  -7.896 2.87e-15 ***
#> (Intercept):4 -2.16730    0.09440 -22.959  < 2e-16 ***
#> x2:1           1.97478    0.23955   8.244  < 2e-16 ***
#> x2:2           2.99255    0.23410  12.783  < 2e-16 ***
#> x2:3           2.14313    0.24145   8.876  < 2e-16 ***
#> x2:4           3.35406    0.22517  14.895  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Names of linear predictors: logitlink(pobs01), logitlink(prob1), 
#> logitlink(pobs02), logitlink(prob2)
#> 
#> Log-likelihood: -3223.343 on 3992 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 6 
#> 
#> No Hauck-Donner effect found in any of the estimates
#>