zigeometric.RdFits a zero-inflated geometric distribution by maximum likelihood estimation.
zigeometric(lpstr0 = "logitlink", lprob = "logitlink",
type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
ipstr0 = NULL, iprob = NULL,
imethod = 1, bias.red = 0.5, zero = NULL)
zigeometricff(lprob = "logitlink", lonempstr0 = "logitlink",
type.fitted = c("mean", "prob", "pobs0", "pstr0", "onempstr0"),
iprob = NULL, ionempstr0 = NULL,
imethod = 1, bias.red = 0.5, zero = "onempstr0")Link functions for the parameters
\(\phi\)
and
\(p\) (prob).
The usual geometric probability parameter is the latter.
The probability of a structural zero is the former.
See Links for more choices.
For the zero-deflated model see below.
Corresponding arguments for the other parameterization. See details below.
A constant used in the initialization process of pstr0.
It should lie between 0 and 1, with 1 having no effect.
See CommonVGAMffArguments
and fittedvlm for information.
See CommonVGAMffArguments for information.
See CommonVGAMffArguments for information.
Function zigeometric() is based on
$$P(Y=0) = \phi + (1-\phi) p,$$
for \(y=0\), and
$$P(Y=y) = (1-\phi) p (1 - p)^{y}.$$
for \(y=1,2,\ldots\).
The parameter \(\phi\) satisfies \(0 < \phi < 1\). The mean of \(Y\) is \(E(Y)=(1-\phi) p / (1-p)\) and these are returned as the fitted values
by default.
By default, the two linear/additive predictors
are \((logit(\phi), logit(p))^T\).
Multiple responses are handled.
Estimated probabilities of a structural zero and an
observed zero can be returned, as in zipoisson;
see fittedvlm for information.
The VGAM family function zigeometricff() has a few
changes compared to zigeometric().
These are:
(i) the order of the linear/additive predictors is switched so the
geometric probability comes first;
(ii) argument onempstr0 is now 1 minus
the probability of a structural zero, i.e.,
the probability of the parent (geometric) component,
i.e., onempstr0 is 1-pstr0;
(iii) argument zero has a new default so that the onempstr0
is intercept-only by default.
Now zigeometricff() is generally recommended over
zigeometric().
Both functions implement Fisher scoring and can handle
multiple responses.
An object of class "vglmff" (see vglmff-class).
The object is used by modelling functions such as vglm
and vgam.
The zero-deflated geometric distribution might
be fitted by setting lpstr0 = identitylink, albeit,
not entirely reliably. See zipoisson
for information that can be applied here. Else
try the zero-altered geometric distribution (see
zageometric).
gdata <- data.frame(x2 = runif(nn <- 1000) - 0.5)
gdata <- transform(gdata, x3 = runif(nn) - 0.5,
x4 = runif(nn) - 0.5)
gdata <- transform(gdata, eta1 = 1.0 - 1.0 * x2 + 2.0 * x3,
eta2 = -1.0,
eta3 = 0.5)
gdata <- transform(gdata, prob1 = logitlink(eta1, inverse = TRUE),
prob2 = logitlink(eta2, inverse = TRUE),
prob3 = logitlink(eta3, inverse = TRUE))
gdata <- transform(gdata, y1 = rzigeom(nn, prob1, pstr0 = prob3),
y2 = rzigeom(nn, prob2, pstr0 = prob3),
y3 = rzigeom(nn, prob2, pstr0 = prob3))
with(gdata, table(y1))
#> y1
#> 0 1 2 3 4 5 6 7
#> 894 76 15 8 3 2 1 1
with(gdata, table(y2))
#> y2
#> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 21
#> 716 83 46 39 32 14 24 11 10 7 6 3 1 3 3 1 1
with(gdata, table(y3))
#> y3
#> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 16 17 22
#> 729 83 46 38 29 20 13 9 10 6 7 5 1 1 1 1 1
head(gdata)
#> x2 x3 x4 eta1 eta2 eta3 prob1 prob2
#> 1 -0.30236980 -0.27207140 -0.142621831 0.7582270 -1 0.5 0.6809687 0.2689414
#> 2 0.06008219 0.13106537 -0.119558207 1.2020486 -1 0.5 0.7688890 0.2689414
#> 3 -0.46791286 0.21845620 -0.489926074 1.9048253 -1 0.5 0.8704367 0.2689414
#> 4 0.01148459 -0.09301155 0.019333617 0.8024923 -1 0.5 0.6905074 0.2689414
#> 5 0.23885518 -0.14448582 -0.006219018 0.4721732 -1 0.5 0.6158980 0.2689414
#> 6 -0.40702990 0.06383865 0.415824160 1.5347072 -1 0.5 0.8226940 0.2689414
#> prob3 y1 y2 y3
#> 1 0.6224593 0 0 2
#> 2 0.6224593 0 7 0
#> 3 0.6224593 0 0 0
#> 4 0.6224593 0 3 1
#> 5 0.6224593 0 0 1
#> 6 0.6224593 0 0 10
fit1 <- vglm(y1 ~ x2 + x3 + x4, zigeometric(zero = 1), data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -460.92614
#> Iteration 2: loglikelihood = -439.10057
#> Iteration 3: loglikelihood = -428.24199
#> Iteration 4: loglikelihood = -426.58465
#> Iteration 5: loglikelihood = -426.53816
#> Iteration 6: loglikelihood = -426.53765
#> Iteration 7: loglikelihood = -426.53764
#> Iteration 8: loglikelihood = -426.53764
coef(fit1, matrix = TRUE)
#> logitlink(pstr0) logitlink(prob)
#> (Intercept) 0.4650275 1.0761843
#> x2 0.0000000 -0.9496978
#> x3 0.0000000 2.1503183
#> x4 0.0000000 -0.2276549
head(fitted(fit1, type = "pstr0"))
#> [,1]
#> [1,] 0.6142062
#> [2,] 0.6142062
#> [3,] 0.6142062
#> [4,] 0.6142062
#> [5,] 0.6142062
#> [6,] 0.6142062
fit2 <- vglm(cbind(y2, y3) ~ 1, zigeometric(zero = 1), data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -2382.5343
#> Iteration 2: loglikelihood = -2381.4135
#> Iteration 3: loglikelihood = -2381.4108
#> Iteration 4: loglikelihood = -2381.4108
coef(fit2, matrix = TRUE)
#> logitlink(pstr01) logitlink(prob1) logitlink(pstr02)
#> (Intercept) 0.4648395 -1.025587 0.5156988
#> logitlink(prob2)
#> (Intercept) -0.9687641
summary(fit2)
#>
#> Call:
#> vglm(formula = cbind(y2, y3) ~ 1, family = zigeometric(zero = 1),
#> data = gdata, trace = TRUE)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 0.46484 0.08699 5.344 9.11e-08 ***
#> (Intercept):2 -1.02559 0.06916 -14.828 < 2e-16 ***
#> (Intercept):3 0.51570 0.08857 5.823 5.79e-09 ***
#> (Intercept):4 -0.96876 0.07135 -13.578 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Names of linear predictors: logitlink(pstr01), logitlink(prob1),
#> logitlink(pstr02), logitlink(prob2)
#>
#> Log-likelihood: -2381.411 on 3996 degrees of freedom
#>
#> Number of Fisher scoring iterations: 4
#>
#> No Hauck-Donner effect found in any of the estimates
#>