N1binomial.RdEstimate the four parameters of the (bivariate) \(N_1\)–binomial copula mixed data type model by maximum likelihood estimation.
N1binomial(lmean = "identitylink", lsd = "loglink",
lvar = "loglink", lprob = "logitlink", lapar = "rhobitlink",
zero = c(if (var.arg) "var" else "sd", "apar"),
nnodes = 20, copula = "gaussian", var.arg = FALSE,
imethod = 1, isd = NULL, iprob = NULL, iapar = NULL)Details at CommonVGAMffArguments.
See Links for more link function choices.
Initial values.
Details at CommonVGAMffArguments.
Details at CommonVGAMffArguments.
Number of nodes and weights for the Gauss–Hermite (GH) quadrature. While a higher value should be more accurate, setting an excessive value runs the risk of evaluating some special functions near the boundary of the parameter space and producing numerical problems.
Type of copula used. Currently only the bivariate normal is used but more might be implemented in the future.
See uninormal.
The bivariate response comprises \(Y_1\)
from the linear model having parameters
mean and sd for
\(\mu_1\) and \(\sigma_1\),
and the binary
\(Y_2\) having parameter
prob for its mean \(\mu_2\).
The
joint probability density/mass function is
\(P(y_1, Y_2 = 0) = \phi_1(y_1; \mu_1, \sigma_1)
(1 - \Delta)\)
and
\(P(y_1, Y_2 = 1) = \phi_1(y_1; \mu_1, \sigma_1)
\Delta\)
where \(\Delta\) adjusts \(\mu_2\)
according to the association parameter
\(\alpha\).
The quantity \(\Delta\) is
\(\Phi((\Phi^{-1}(\mu_2) - \alpha Z_1)/
\sqrt{1 - \alpha^2})\).
The quantity \(Z_1\) is \((Y_1-\mu_1) / \sigma_1\).
Thus there is an underlying bivariate normal
distribution, and a copula is used to bring the
two marginal distributions together.
Here,
\(-1 < \alpha < 1\), and
\(\Phi\) is the
cumulative distribution function
pnorm
of a standard univariate normal.
The first marginal
distribution is a normal distribution
for the linear model.
The second column of the response must
have values 0 or 1,
e.g.,
Bernoulli random variables.
When \(\alpha = 0\)
then \(\Delta=\mu_2\).
Together, this family function combines
uninormal and
binomialff.
If the response are correlated then
a more efficient joint analysis
should result.
This VGAM family function cannot handle multiple responses. Only a two-column matrix is allowed. The two-column fitted value matrix has columns \(\mu_1\) and \(\mu_2\).
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm
and vgam.
Song, P. X.-K. (2007). Correlated Data Analysis: Modeling, Analytics, and Applications. Springer.
This VGAM family function is fragile. Because the EIMs are approximated by GH quadrature it is found that convergence may be a little slower than for other models whose EIM is tractable. Also, the log-likelihood may be flat at the MLE with respect to \(\alpha\) especially because the correlation between the two mixed data types may be weak.
It pays to set trace = TRUE to
monitor convergence, especially when
abs(apar) is high.
nn <- 1000; mymu <- 1; sdev <- exp(1)
apar <- rhobitlink(0.5, inverse = TRUE)
prob <- logitlink(0.5, inverse = TRUE)
mat <- rN1binom(nn, mymu, sdev, prob, apar)
nbdata <- data.frame(y1 = mat[, 1], y2 = mat[, 2])
fit1 <- vglm(cbind(y1, y2) ~ 1, N1binomial,
nbdata, trace = TRUE)
#> Iteration 1: loglikelihood = -6125.0207
#> Iteration 2: loglikelihood = -6124.1736
#> Iteration 3: loglikelihood = -6124.128
#> Iteration 4: loglikelihood = -6124.1258
#> Iteration 5: loglikelihood = -6124.1257
#> Iteration 6: loglikelihood = -6124.1257
coef(fit1, matrix = TRUE)
#> mean loglink(sd) logitlink(prob) rhobitlink(apar)
#> (Intercept) 0.9993132 1.004576 0.4807467 0.5943222
Coef(fit1)
#> mean sd prob apar
#> 0.9993132 2.7307496 0.6179242 0.2887125
head(fitted(fit1))
#> y1 y2
#> 1 0.9993132 0.6179242
#> 2 0.9993132 0.6179242
#> 3 0.9993132 0.6179242
#> 4 0.9993132 0.6179242
#> 5 0.9993132 0.6179242
#> 6 0.9993132 0.6179242
summary(fit1)
#>
#> Call:
#> vglm(formula = cbind(y1, y2) ~ 1, family = N1binomial, data = nbdata,
#> trace = TRUE)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 0.99931 0.08635 11.572 < 2e-16 ***
#> (Intercept):2 1.00458 0.02236 44.926 < 2e-16 ***
#> (Intercept):3 0.48075 0.07137 6.736 1.63e-11 ***
#> (Intercept):4 0.59432 0.09024 6.586 4.51e-11 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Names of linear predictors: mean, loglink(sd), logitlink(prob),
#> rhobitlink(apar)
#>
#> Log-likelihood: -6124.126 on 3996 degrees of freedom
#>
#> Number of Fisher scoring iterations: 6
#>
confint(fit1)
#> 2.5 % 97.5 %
#> (Intercept):1 0.8300627 1.1685637
#> (Intercept):2 0.9607500 1.0484023
#> (Intercept):3 0.3408653 0.6206282
#> (Intercept):4 0.4174590 0.7711853