Linear Model and Binomial Mixed Data Type Family Function

Estimate the four parameters of the (bivariate) \(N_1\)–binomial copula mixed data type model by maximum likelihood estimation.

N1binomial(lmean = "identitylink", lsd = "loglink",
    lvar = "loglink", lprob = "logitlink", lapar = "rhobitlink",
    zero = c(if (var.arg) "var" else "sd", "apar"),
    nnodes = 20, copula = "gaussian", var.arg = FALSE,
    imethod = 1, isd = NULL, iprob = NULL, iapar = NULL)

Arguments

lmean, lsd, lvar, lprob, lapar

Details at CommonVGAMffArguments. See Links for more link function choices.

imethod, isd, iprob, iapar

Initial values. Details at CommonVGAMffArguments.

zero

Details at CommonVGAMffArguments.

nnodes

Number of nodes and weights for the Gauss–Hermite (GH) quadrature. While a higher value should be more accurate, setting an excessive value runs the risk of evaluating some special functions near the boundary of the parameter space and producing numerical problems.

copula

Type of copula used. Currently only the bivariate normal is used but more might be implemented in the future.

var.arg

See uninormal.

Details

The bivariate response comprises \(Y_1\) from the linear model having parameters mean and sd for \(\mu_1\) and \(\sigma_1\), and the binary \(Y_2\) having parameter prob for its mean \(\mu_2\). The joint probability density/mass function is \(P(y_1, Y_2 = 0) = \phi_1(y_1; \mu_1, \sigma_1) (1 - \Delta)\) and \(P(y_1, Y_2 = 1) = \phi_1(y_1; \mu_1, \sigma_1) \Delta\) where \(\Delta\) adjusts \(\mu_2\) according to the association parameter \(\alpha\). The quantity \(\Delta\) is \(\Phi((\Phi^{-1}(\mu_2) - \alpha Z_1)/ \sqrt{1 - \alpha^2})\). The quantity \(Z_1\) is \((Y_1-\mu_1) / \sigma_1\). Thus there is an underlying bivariate normal distribution, and a copula is used to bring the two marginal distributions together. Here, \(-1 < \alpha < 1\), and \(\Phi\) is the cumulative distribution function pnorm of a standard univariate normal.

The first marginal distribution is a normal distribution for the linear model. The second column of the response must have values 0 or 1, e.g., Bernoulli random variables. When \(\alpha = 0\) then \(\Delta=\mu_2\). Together, this family function combines uninormal and binomialff. If the response are correlated then a more efficient joint analysis should result.

This VGAM family function cannot handle multiple responses. Only a two-column matrix is allowed. The two-column fitted value matrix has columns \(\mu_1\) and \(\mu_2\).

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

References

Song, P. X.-K. (2007). Correlated Data Analysis: Modeling, Analytics, and Applications. Springer.

Author

T. W. Yee

Note

This VGAM family function is fragile. Because the EIMs are approximated by GH quadrature it is found that convergence may be a little slower than for other models whose EIM is tractable. Also, the log-likelihood may be flat at the MLE with respect to \(\alpha\) especially because the correlation between the two mixed data types may be weak.

It pays to set trace = TRUE to monitor convergence, especially when abs(apar) is high.

Examples

nn <- 1000; mymu <- 1; sdev <- exp(1)
apar <- rhobitlink(0.5, inverse = TRUE)
prob <-  logitlink(0.5, inverse = TRUE)
mat <- rN1binom(nn, mymu, sdev, prob, apar)
nbdata <- data.frame(y1 = mat[, 1], y2 = mat[, 2])
fit1 <- vglm(cbind(y1, y2) ~ 1, N1binomial,
             nbdata, trace = TRUE)
#> Iteration 1: loglikelihood = -6125.0207
#> Iteration 2: loglikelihood = -6124.1736
#> Iteration 3: loglikelihood = -6124.128
#> Iteration 4: loglikelihood = -6124.1258
#> Iteration 5: loglikelihood = -6124.1257
#> Iteration 6: loglikelihood = -6124.1257
coef(fit1, matrix = TRUE)
#>                  mean loglink(sd) logitlink(prob) rhobitlink(apar)
#> (Intercept) 0.9993132    1.004576       0.4807467        0.5943222
Coef(fit1)
#>      mean        sd      prob      apar 
#> 0.9993132 2.7307496 0.6179242 0.2887125 
head(fitted(fit1))
#>          y1        y2
#> 1 0.9993132 0.6179242
#> 2 0.9993132 0.6179242
#> 3 0.9993132 0.6179242
#> 4 0.9993132 0.6179242
#> 5 0.9993132 0.6179242
#> 6 0.9993132 0.6179242
summary(fit1)
#> 
#> Call:
#> vglm(formula = cbind(y1, y2) ~ 1, family = N1binomial, data = nbdata, 
#>     trace = TRUE)
#> 
#> Coefficients: 
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept):1  0.99931    0.08635  11.572  < 2e-16 ***
#> (Intercept):2  1.00458    0.02236  44.926  < 2e-16 ***
#> (Intercept):3  0.48075    0.07137   6.736 1.63e-11 ***
#> (Intercept):4  0.59432    0.09024   6.586 4.51e-11 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Names of linear predictors: mean, loglink(sd), logitlink(prob), 
#> rhobitlink(apar)
#> 
#> Log-likelihood: -6124.126 on 3996 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 6 
#> 
confint(fit1)
#>                   2.5 %    97.5 %
#> (Intercept):1 0.8300627 1.1685637
#> (Intercept):2 0.9607500 1.0484023
#> (Intercept):3 0.3408653 0.6206282
#> (Intercept):4 0.4174590 0.7711853