The Two-stage Sequential Binomial Distribution Family Function

Estimation of the probabilities of a two-stage binomial distribution.

seq2binomial(lprob1 = "logitlink", lprob2 = "logitlink",
             iprob1 = NULL,    iprob2 = NULL,
             parallel = FALSE, zero = NULL)

Arguments

lprob1, lprob2

Parameter link functions applied to the two probabilities, called $p$ and $q$ below. See Links for more choices.

iprob1, iprob2

Optional initial value for the first and second probabilities respectively. A NULL means a value is obtained in the initialize slot.

parallel, zero

Details at Links. If parallel = TRUE then the constraint also applies to the intercept. See CommonVGAMffArguments for details.

Details

This VGAM family function fits the model described by Crowder and Sweeting (1989) which is described as follows. Each of $m$ spores has a probability $p$ of germinating. Of the $y_1$ spores that germinate, each has a probability $q$ of bending in a particular direction. Let $y_2$ be the number that bend in the specified direction. The probability model for this data is $P(y_1,y_2) =$ $$ {m \choose y_1} p^{y_1} (1-p)^{m-y_1} {y_1 \choose y_2} q^{y_2} (1-q)^{y_1-y_2}$$ for $0 < p < 1$, $0 < q < 1$, $y_1=1,\ldots,m$ and $y_2=1,\ldots,y_1$. Here, $p$ is prob1, $q$ is prob2.

Although the Authors refer to this as the bivariate binomial model, I have named it the (two-stage) sequential binomial model. Fisher scoring is used.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

References

Crowder, M. and Sweeting, T. (1989). Bayesian inference for a bivariate binomial distribution. Biometrika, 76, 599–603.

Author

Thomas W. Yee

Note

The response must be a two-column matrix of sample proportions corresponding to $y_1$ and $y_2$. The $m$ values should be inputted with the weights argument of vglm and vgam. The fitted value is a two-column matrix of estimated probabilities $p$ and $q$. A common form of error is when there are no trials for $y_1$, e.g., if mvector below has some values which are zero.

Examples

sdata <- data.frame(mvector = round(rnorm(nn <- 100, m = 10, sd = 2)),
                    x2 = runif(nn))
sdata <- transform(sdata, prob1 = logitlink(+2 - x2, inverse = TRUE),
                          prob2 = logitlink(-2 + x2, inverse = TRUE))
sdata <- transform(sdata, successes1 = rbinom(nn, size = mvector,    prob = prob1))
sdata <- transform(sdata, successes2 = rbinom(nn, size = successes1, prob = prob2))
sdata <- transform(sdata, y1 = successes1 / mvector)
sdata <- transform(sdata, y2 = successes2 / successes1)
fit <- vglm(cbind(y1, y2) ~ x2, seq2binomial, weight = mvector,
            data = sdata, trace = TRUE)
#> Iteration 1: loglikelihood = -297.55139
#> Iteration 2: loglikelihood = -297.41156
#> Iteration 3: loglikelihood = -297.41154
#> Iteration 4: loglikelihood = -297.41154
coef(fit)
#> (Intercept):1 (Intercept):2          x2:1          x2:2 
#>     1.9763016    -1.8181994    -1.2847068     0.4717196 
coef(fit, matrix = TRUE)
#>             logitlink(prob1) logitlink(prob2)
#> (Intercept)         1.976302       -1.8181994
#> x2                 -1.284707        0.4717196
head(fitted(fit))
#>       prob1     prob2
#> 1 0.8374123 0.1552000
#> 2 0.8536777 0.1492937
#> 3 0.8266989 0.1589279
#> 4 0.7893381 0.1711409
#> 5 0.6876752 0.2006326
#> 6 0.7891598 0.1711967
head(depvar(fit))
#>          y1         y2
#> 1 0.8461538 0.09090909
#> 2 0.9090909 0.10000000
#> 3 0.8888889 0.25000000
#> 4 0.3333333 0.00000000
#> 5 0.5833333 0.14285714
#> 6 0.6000000 0.16666667
head(weights(fit, type = "prior"))  # Same as with(sdata, mvector)
#>      [,1]
#> [1,]   13
#> [2,]   11
#> [3,]    9
#> [4,]    6
#> [5,]   12
#> [6,]   10
# Number of first successes:
head(depvar(fit)[, 1] * c(weights(fit, type = "prior")))
#>  1  2  3  4  5  6 
#> 11 10  8  2  7  6 
# Number of second successes:
head(depvar(fit)[, 2] * c(weights(fit, type = "prior")) *
                          depvar(fit)[, 1])
#> 1 2 3 4 5 6 
#> 1 1 2 0 1 1