Zero-Inflated Geometric Distribution Family Function

Fits a zero-inflated geometric distribution by maximum likelihood estimation.

zigeometric(lpstr0  = "logitlink", lprob = "logitlink",
            type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
            ipstr0  = NULL, iprob = NULL,
            imethod = 1, bias.red = 0.5, zero = NULL)
zigeometricff(lprob = "logitlink", lonempstr0 = "logitlink",
              type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
              iprob = NULL, ionempstr0 = NULL,
              imethod = 1, bias.red = 0.5, zero = "onempstr0")

Arguments

lpstr0, lprob

Link functions for the parameters $\phi$ and $p$ (prob). The usual geometric probability parameter is the latter. The probability of a structural zero is the former. See Links for more choices. For the zero-deflated model see below.

lonempstr0, ionempstr0

Corresponding arguments for the other parameterization. See details below.

bias.red

A constant used in the initialization process of pstr0. It should lie between 0 and 1, with 1 having no effect.

type.fitted

See CommonVGAMffArguments and fittedvlm for information.

ipstr0, iprob

See CommonVGAMffArguments for information.

zero, imethod

See CommonVGAMffArguments for information.

Details

Function zigeometric() is based on $$P(Y=0) = \phi + (1-\phi) p,$$ for $y=0$, and $$P(Y=y) = (1-\phi) p (1 - p)^{y}.$$ for $y=1,2,\ldots$. The parameter $\phi$ satisfies $0 < \phi < 1$. The mean of $Y$ is $E(Y)=(1-\phi) p / (1-p)$ and these are returned as the fitted values by default. By default, the two linear/additive predictors are $(logit(\phi), logit(p))^T$. Multiple responses are handled.

Estimated probabilities of a structural zero and an observed zero can be returned, as in zipoisson; see fittedvlm for information.

The VGAM family function zigeometricff() has a few changes compared to zigeometric(). These are: (i) the order of the linear/additive predictors is switched so the geometric probability comes first; (ii) argument onempstr0 is now 1 minus the probability of a structural zero, i.e., the probability of the parent (geometric) component, i.e., onempstr0 is 1-pstr0; (iii) argument zero has a new default so that the onempstr0 is intercept-only by default. Now zigeometricff() is generally recommended over zigeometric(). Both functions implement Fisher scoring and can handle multiple responses.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

Author

T. W. Yee

Note

The zero-deflated geometric distribution might be fitted by setting lpstr0 = identitylink, albeit, not entirely reliably. See zipoisson for information that can be applied here. Else try the zero-altered geometric distribution (see zageometric).

Examples

gdata <- data.frame(x2 = runif(nn <- 1000) - 0.5)
gdata <- transform(gdata, x3 = runif(nn) - 0.5,
                          x4 = runif(nn) - 0.5)
gdata <- transform(gdata, eta1 =  1.0 - 1.0 * x2 + 2.0 * x3,
                          eta2 = -1.0,
                          eta3 =  0.5)
gdata <- transform(gdata, prob1 = logitlink(eta1, inverse = TRUE),
                          prob2 = logitlink(eta2, inverse = TRUE),
                          prob3 = logitlink(eta3, inverse = TRUE))
gdata <- transform(gdata, y1 = rzigeom(nn, prob1, pstr0 = prob3),
                          y2 = rzigeom(nn, prob2, pstr0 = prob3),
                          y3 = rzigeom(nn, prob2, pstr0 = prob3))
with(gdata, table(y1))
#> y1
#>   0   1   2   3   4   5   6   7 
#> 894  76  15   8   3   2   1   1 
with(gdata, table(y2))
#> y2
#>   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  21 
#> 716  83  46  39  32  14  24  11  10   7   6   3   1   3   3   1   1 
with(gdata, table(y3))
#> y3
#>   0   1   2   3   4   5   6   7   8   9  10  11  12  13  16  17  22 
#> 729  83  46  38  29  20  13   9  10   6   7   5   1   1   1   1   1 
head(gdata)
#>            x2          x3           x4      eta1 eta2 eta3     prob1     prob2
#> 1 -0.30236980 -0.27207140 -0.142621831 0.7582270   -1  0.5 0.6809687 0.2689414
#> 2  0.06008219  0.13106537 -0.119558207 1.2020486   -1  0.5 0.7688890 0.2689414
#> 3 -0.46791286  0.21845620 -0.489926074 1.9048253   -1  0.5 0.8704367 0.2689414
#> 4  0.01148459 -0.09301155  0.019333617 0.8024923   -1  0.5 0.6905074 0.2689414
#> 5  0.23885518 -0.14448582 -0.006219018 0.4721732   -1  0.5 0.6158980 0.2689414
#> 6 -0.40702990  0.06383865  0.415824160 1.5347072   -1  0.5 0.8226940 0.2689414
#>       prob3 y1 y2 y3
#> 1 0.6224593  0  0  2
#> 2 0.6224593  0  7  0
#> 3 0.6224593  0  0  0
#> 4 0.6224593  0  3  1
#> 5 0.6224593  0  0  1
#> 6 0.6224593  0  0 10

fit1 <- vglm(y1 ~ x2 + x3 + x4, zigeometric(zero = 1), data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -460.92614
#> Iteration 2: loglikelihood = -439.10057
#> Iteration 3: loglikelihood = -428.24199
#> Iteration 4: loglikelihood = -426.58465
#> Iteration 5: loglikelihood = -426.53816
#> Iteration 6: loglikelihood = -426.53765
#> Iteration 7: loglikelihood = -426.53764
#> Iteration 8: loglikelihood = -426.53764
coef(fit1, matrix = TRUE)
#>             logitlink(pstr0) logitlink(prob)
#> (Intercept)        0.4650275       1.0761843
#> x2                 0.0000000      -0.9496978
#> x3                 0.0000000       2.1503183
#> x4                 0.0000000      -0.2276549
head(fitted(fit1, type = "pstr0"))
#>           [,1]
#> [1,] 0.6142062
#> [2,] 0.6142062
#> [3,] 0.6142062
#> [4,] 0.6142062
#> [5,] 0.6142062
#> [6,] 0.6142062

fit2 <- vglm(cbind(y2, y3) ~ 1, zigeometric(zero = 1), data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -2382.5343
#> Iteration 2: loglikelihood = -2381.4135
#> Iteration 3: loglikelihood = -2381.4108
#> Iteration 4: loglikelihood = -2381.4108
coef(fit2, matrix = TRUE)
#>             logitlink(pstr01) logitlink(prob1) logitlink(pstr02)
#> (Intercept)         0.4648395        -1.025587         0.5156988
#>             logitlink(prob2)
#> (Intercept)       -0.9687641
summary(fit2)
#> 
#> Call:
#> vglm(formula = cbind(y2, y3) ~ 1, family = zigeometric(zero = 1), 
#>     data = gdata, trace = TRUE)
#> 
#> Coefficients: 
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept):1  0.46484    0.08699   5.344 9.11e-08 ***
#> (Intercept):2 -1.02559    0.06916 -14.828  < 2e-16 ***
#> (Intercept):3  0.51570    0.08857   5.823 5.79e-09 ***
#> (Intercept):4 -0.96876    0.07135 -13.578  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Names of linear predictors: logitlink(pstr01), logitlink(prob1), 
#> logitlink(pstr02), logitlink(prob2)
#> 
#> Log-likelihood: -2381.411 on 3996 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 4 
#> 
#> No Hauck-Donner effect found in any of the estimates
#>