zageometric.RdFits a zero-altered geometric distribution based on a conditional model involving a Bernoulli distribution and a positive-geometric distribution.
zageometric(lpobs0 = "logitlink", lprob = "logitlink",
type.fitted = c("mean", "prob", "pobs0", "onempobs0"),
imethod = 1, ipobs0 = NULL, iprob = NULL, zero = NULL)
zageometricff(lprob = "logitlink", lonempobs0 = "logitlink",
type.fitted = c("mean", "prob", "pobs0", "onempobs0"),
imethod = 1, iprob = NULL, ionempobs0 = NULL, zero = "onempobs0")Link function for the parameter \(p_0\) or \(\phi\),
called pobs0 or phi here.
See Links for more choices.
Parameter link function applied to the probability of success,
called prob
or \(p\).
See Links for more choices.
See CommonVGAMffArguments
and fittedvlm for information.
The response \(Y\) is zero with probability \(p_0\), or \(Y\) has a positive-geometric distribution with probability \(1-p_0\). Thus \(0 < p_0 < 1\), which is modelled as a function of the covariates. The zero-altered geometric distribution differs from the zero-inflated geometric distribution in that the former has zeros coming from one source, whereas the latter has zeros coming from the geometric distribution too. The zero-inflated geometric distribution is implemented in the VGAM package. Some people call the zero-altered geometric a hurdle model.
The input can be a matrix (multiple responses).
By default, the two linear/additive predictors
of zageometric
are \((logit(\phi), logit(p))^T\).
The VGAM family function zageometricff() has a few
changes compared to zageometric().
These are:
(i) the order of the linear/additive predictors is switched so the
geometric probability comes first;
(ii) argument onempobs0 is now 1 minus the probability of an observed 0,
i.e., the probability of the positive geometric distribution,
i.e., onempobs0 is 1-pobs0;
(iii) argument zero has a new default so that the pobs0
is intercept-only by default.
Now zageometricff() is generally recommended over
zageometric().
Both functions implement Fisher scoring and can handle
multiple responses.
An object of class "vglmff" (see vglmff-class).
The object is used by modelling functions such as vglm,
and vgam.
The fitted.values slot of the fitted object,
which should be extracted by the generic function fitted, returns
the mean \(\mu\) (default) which is given by
$$\mu = (1-\phi) / p.$$
If type.fitted = "pobs0" then \(p_0\) is returned.
Convergence for this VGAM family function seems to depend quite strongly on providing good initial values.
Inference obtained from summary.vglm and summary.vgam
may or may not be correct. In particular, the p-values, standard errors
and degrees of freedom may need adjustment. Use simulation on artificial
data to check that these are reasonable.
Note this family function allows \(p_0\) to be modelled as functions of the covariates. It is a conditional model, not a mixture model.
This family function effectively combines
binomialff and
posgeometric() and geometric into
one family function.
However, posgeometric() is not written because it
is trivially related to geometric.
zdata <- data.frame(x2 = runif(nn <- 1000))
zdata <- transform(zdata, pobs0 = logitlink(-1 + 2*x2, inverse = TRUE),
prob = logitlink(-2 + 3*x2, inverse = TRUE))
zdata <- transform(zdata, y1 = rzageom(nn, prob = prob, pobs0 = pobs0),
y2 = rzageom(nn, prob = prob, pobs0 = pobs0))
with(zdata, table(y1))
#> y1
#> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 17 18 19 20
#> 546 159 88 64 30 20 27 5 12 12 3 8 1 6 5 2 1 3 1 2
#> 21 23 24
#> 1 2 2
fit <- vglm(cbind(y1, y2) ~ x2, zageometric, data = zdata, trace = TRUE)
#> Iteration 1: loglikelihood = -3366.4351
#> Iteration 2: loglikelihood = -3249.489
#> Iteration 3: loglikelihood = -3224.5153
#> Iteration 4: loglikelihood = -3223.346
#> Iteration 5: loglikelihood = -3223.343
#> Iteration 6: loglikelihood = -3223.343
coef(fit, matrix = TRUE)
#> logitlink(pobs01) logitlink(prob1) logitlink(pobs02)
#> (Intercept) -0.7904093 -1.973762 -1.090994
#> x2 1.9747830 2.992545 2.143130
#> logitlink(prob2)
#> (Intercept) -2.167299
#> x2 3.354059
head(fitted(fit))
#> y1 y2
#> 1 0.7261674 0.7948155
#> 2 3.0129591 3.6744562
#> 3 1.2055685 1.3627938
#> 4 2.5100170 3.0116937
#> 5 2.6577542 3.2051170
#> 6 0.4903941 0.5270491
head(predict(fit))
#> logitlink(pobs01) logitlink(prob1) logitlink(pobs02) logitlink(prob2)
#> [1,] 0.5349262 0.03462333 0.34732427 0.08370952
#> [2,] -0.4060872 -1.39136880 -0.67390880 -1.51454945
#> [3,] 0.1813044 -0.50124781 -0.03644316 -0.51689748
#> [4,] -0.2923377 -1.21899514 -0.55046234 -1.32135220
#> [5,] -0.3280844 -1.27316500 -0.58925641 -1.38206604
#> [6,] 0.8312928 0.48373110 0.66895557 0.58707171
summary(fit)
#>
#> Call:
#> vglm(formula = cbind(y1, y2) ~ x2, family = zageometric, data = zdata,
#> trace = TRUE)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 -0.79041 0.13399 -5.899 3.66e-09 ***
#> (Intercept):2 -1.97376 0.09949 -19.839 < 2e-16 ***
#> (Intercept):3 -1.09099 0.13816 -7.896 2.87e-15 ***
#> (Intercept):4 -2.16730 0.09440 -22.959 < 2e-16 ***
#> x2:1 1.97478 0.23955 8.244 < 2e-16 ***
#> x2:2 2.99255 0.23410 12.783 < 2e-16 ***
#> x2:3 2.14313 0.24145 8.876 < 2e-16 ***
#> x2:4 3.35406 0.22517 14.895 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Names of linear predictors: logitlink(pobs01), logitlink(prob1),
#> logitlink(pobs02), logitlink(prob2)
#>
#> Log-likelihood: -3223.343 on 3992 degrees of freedom
#>
#> Number of Fisher scoring iterations: 6
#>
#> No Hauck-Donner effect found in any of the estimates
#>