Geometric (Truncated and Untruncated) Distributions

Maximum likelihood estimation for the geometric and truncated geometric distributions.

geometric(link = "logitlink", expected = TRUE, imethod = 1,
          iprob = NULL, zero = NULL)
truncgeometric(upper.limit = Inf,
               link = "logitlink", expected = TRUE, imethod = 1,
               iprob = NULL, zero = NULL)

Arguments

link

Parameter link function applied to the probability parameter \(p\), which lies in the unit interval. See Links for more choices.

expected

Logical. Fisher scoring is used if expected = TRUE, else Newton-Raphson.

iprob, imethod, zero

See CommonVGAMffArguments for details.

upper.limit

Numeric. Upper values. As a vector, it is recycled across responses first. The default value means both family functions should give the same result.

Details

A random variable \(Y\) has a 1-parameter geometric distribution if \(P(Y=y) = p (1-p)^y\) for \(y=0,1,2,\ldots\). Here, \(p\) is the probability of success, and \(Y\) is the number of (independent) trials that are fails until a success occurs. Thus the response \(Y\) should be a non-negative integer. The mean of \(Y\) is \(E(Y) = (1-p)/p\) and its variance is \(Var(Y) = (1-p)/p^2\). The geometric distribution is a special case of the negative binomial distribution (see negbinomial). The geometric distribution is also a special case of the Borel distribution, which is a Lagrangian distribution. If \(Y\) has a geometric distribution with parameter \(p\) then \(Y+1\) has a positive-geometric distribution with the same parameter. Multiple responses are permitted.

For truncgeometric(), the (upper) truncated geometric distribution can have response integer values from 0 to upper.limit. It has density prob * (1 - prob)^y / [1-(1-prob)^(1+upper.limit)].

For a generalized truncated geometric distribution with integer values \(L\) to \(U\), say, subtract \(L\) from the response and feed in \(U-L\) as the upper limit.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

References

Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.

Author

T. W. Yee. Help from Viet Hoang Quoc is gratefully acknowledged.

Examples

gdata <- data.frame(x2 = runif(nn <- 1000) - 0.5)
gdata <- transform(gdata, x3 = runif(nn) - 0.5,
                          x4 = runif(nn) - 0.5)
gdata <- transform(gdata, eta  = -1.0 - 1.0 * x2 + 2.0 * x3)
gdata <- transform(gdata, prob = logitlink(eta, inverse = TRUE))
gdata <- transform(gdata, y1 = rgeom(nn, prob))
with(gdata, table(y1))
#> y1
#>   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19 
#> 270 196 125  94  85  53  28  22  21  19  10  13   8  11   4   5   6   1   6   3 
#>  22  23  24  27  28  29  30  31  33  35  39  40  45  55 
#>   1   2   3   1   1   1   2   1   1   1   1   2   2   1 
fit1 <- vglm(y1 ~ x2 + x3 + x4, geometric, data = gdata, trace = TRUE)
#> Iteration 1: loglikelihood = -2253.5284
#> Iteration 2: loglikelihood = -2239.0745
#> Iteration 3: loglikelihood = -2238.9309
#> Iteration 4: loglikelihood = -2238.9309
#> Iteration 5: loglikelihood = -2238.9309
coef(fit1, matrix = TRUE)
#>             logitlink(prob)
#> (Intercept)      -1.0424831
#> x2               -1.1031388
#> x3                2.0009718
#> x4               -0.1133101
summary(fit1)
#> 
#> Call:
#> vglm(formula = y1 ~ x2 + x3 + x4, family = geometric, data = gdata, 
#>     trace = TRUE)
#> 
#> Coefficients: 
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -1.04248    0.03782 -27.565   <2e-16 ***
#> x2          -1.10314    0.12796  -8.621   <2e-16 ***
#> x3           2.00097    0.13473  14.852   <2e-16 ***
#> x4          -0.11331    0.12652  -0.896     0.37    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Name of linear predictor: logitlink(prob) 
#> 
#> Log-likelihood: -2238.931 on 996 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 5 
#> 
#> No Hauck-Donner effect found in any of the estimates
#> 

# Truncated geometric (between 0 and upper.limit)
upper.limit <- 5
tdata <- subset(gdata, y1 <= upper.limit)
nrow(tdata)  # Less than nn
#> [1] 823
fit2 <- vglm(y1 ~ x2 + x3 + x4, truncgeometric(upper.limit),
             data = tdata, trace = TRUE)
#> Iteration 1: loglikelihood = -1330.0291
#> Iteration 2: loglikelihood = -1328.6492
#> Iteration 3: loglikelihood = -1328.6364
#> Iteration 4: loglikelihood = -1328.6363
#> Iteration 5: loglikelihood = -1328.6363
coef(fit2, matrix = TRUE)
#>             logitlink(prob)
#> (Intercept)      -1.1591259
#> x2               -0.6966647
#> x3                2.6156253
#> x4               -0.1506785

# Generalized truncated geometric (between lower.limit and upper.limit)
lower.limit <- 1
upper.limit <- 8
gtdata <- subset(gdata, lower.limit <= y1 & y1 <= upper.limit)
with(gtdata, table(y1))
#> y1
#>   1   2   3   4   5   6   7   8 
#> 196 125  94  85  53  28  22  21 
nrow(gtdata)  # Less than nn
#> [1] 624
fit3 <- vglm(y1 - lower.limit ~ x2 + x3 + x4,
             truncgeometric(upper.limit - lower.limit),
             data = gtdata, trace = TRUE)
#> Iteration 1: loglikelihood = -1121.4393
#> Iteration 2: loglikelihood = -1120.8169
#> Iteration 3: loglikelihood = -1120.8051
#> Iteration 4: loglikelihood = -1120.8049
#> Iteration 5: loglikelihood = -1120.8049
coef(fit3, matrix = TRUE)
#>             logitlink(prob)
#> (Intercept)      -0.9214736
#> x2               -0.7255879
#> x3                1.7786057
#> x4                0.2922239