Linear Model and Poisson Mixed Data Type Family Function

Estimate the four parameters of the (bivariate) \(N_1\)–Poisson copula mixed data type model by maximum likelihood estimation.

N1poisson(lmean = "identitylink", lsd = "loglink",
    lvar = "loglink", llambda = "loglink", lapar = "rhobitlink",
    zero = c(if (var.arg) "var" else "sd", "apar"),
    doff = 5, nnodes = 20, copula = "gaussian",
    var.arg = FALSE, imethod = 1, isd = NULL,
    ilambda = NULL, iapar = NULL)

Arguments

lmean, lsd, lvar, llambda, lapar

Details at CommonVGAMffArguments. See Links for more link function choices. The second response is primarily controlled by the parameter \(\lambda_2\).

imethod, isd, ilambda, iapar

Initial values. Details at CommonVGAMffArguments.

zero

Details at CommonVGAMffArguments.

doff

Numeric of unit length, the denominator offset \(\delta>0\). A monotonic transformation \(\Delta^* = \lambda_2^{2/3} / (|\delta| + \lambda_2^{2/3})\) is taken to map the Poisson mean onto the unit interval. This argument is \(\delta\). The default reflects the property that the normal approximation to the Poisson work wells for \(\lambda_2 \geq 10\) or thereabouts, hence that value is mapped to the origin by qnorm. That's because 10**(2/3) is approximately 5. It is known that the \(\lambda_2\) rate parameter raised to the power of \(2/3\) is a transformation that approximates the normal density more closely.

Alternatively, delta may be assigned a single negative value. If so, then \(\Delta^* = \log(1 + \lambda_2) / [|\delta| + \log(1 + \lambda_2)]\) is used. For this, doff = -log1p(10) is suggested.

nnodes, copula

Details at N1binomial.

var.arg

See uninormal.

Details

The bivariate response comprises \(Y_1\) from a linear model having parameters mean and sd for \(\mu_1\) and \(\sigma_1\), and the Poisson count \(Y_2\) having parameter lambda for its mean \(\lambda_2\). The joint probability density/mass function is \(P(y_1, Y_2 = y_2) = \phi_1(y_1; \mu_1, \sigma_1) \exp(-h^{-1}(\Delta)) [h^{-1}(\Delta)]^{y_2} / y_2!\) where \(\Delta\) adjusts \(\lambda_2\) according to the association parameter \(\alpha\). The quantity \(\Delta\) is \(\Phi((\Phi^{-1}(h(\lambda_2)) - \alpha Z_1) / \sqrt{1 - \alpha^2})\) where \(h\) maps \(\lambda_2\) onto the unit interval. The quantity \(Z_1\) is \((Y_1-\mu_1) / \sigma_1\). Thus there is an underlying bivariate normal distribution, and a copula is used to bring the two marginal distributions together. Here, \(-1 < \alpha < 1\), and \(\Phi\) is the cumulative distribution function pnorm of a standard univariate normal.

The first marginal distribution is a normal distribution for the linear model. The second column of the response must have nonnegative integer values. When \(\alpha = 0\) then \(\Delta=\Delta^*\). Together, this family function combines uninormal and poissonff. If the response are correlated then a more efficient joint analysis should result.

The second marginal distribution allows for overdispersion relative to an ordinary Poisson distribution—a property due to \(\alpha\).

This VGAM family function cannot handle multiple responses. Only a two-column matrix is allowed. The two-column fitted value matrix has columns \(\mu_1\) and \(\lambda_2\).

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

Author

T. W. Yee

Note

This VGAM family function is based on N1binomial and shares many properties with it. It pays to set trace = TRUE to monitor convergence, especially when abs(apar) is high.

Examples

apar <- rhobitlink(0.3, inverse = TRUE)
nn <- 1000; mymu <- 1; sdev <- exp(1)
lambda <- loglink(1, inverse = TRUE)
mat <- rN1pois(nn, mymu, sdev, lambda, apar)
npdata <- data.frame(y1 = mat[, 1], y2 = mat[, 2])
with(npdata, var(y2) / mean(y2))  # Overdispersion
#> [1] 1.432781
fit1 <- vglm(cbind(y1, y2) ~ 1, N1poisson,
             npdata, trace = TRUE)
#> Iteration 1: loglikelihood = -8521.2793
#> Iteration 2: loglikelihood = -8499.6837
#> Iteration 3: loglikelihood = -8496.985
#> Iteration 4: loglikelihood = -8496.598
#> Iteration 5: loglikelihood = -8496.5331
#> Iteration 6: loglikelihood = -8496.521
#> Iteration 7: loglikelihood = -8496.5187
#> Iteration 8: loglikelihood = -8496.5182
#> Iteration 9: loglikelihood = -8496.5181
#> Iteration 10: loglikelihood = -8496.5181
coef(fit1, matrix = TRUE)
#>                  mean loglink(sd) loglink(lambda) rhobitlink(apar)
#> (Intercept) 0.9159481   0.9963348        1.008717         0.312851
Coef(fit1)
#>      mean        sd    lambda      apar 
#> 0.9159481 2.7083370 2.7420807 0.1551620 
head(fitted(fit1))
#>          y1       y2
#> 1 0.9159481 2.742081
#> 2 0.9159481 2.742081
#> 3 0.9159481 2.742081
#> 4 0.9159481 2.742081
#> 5 0.9159481 2.742081
#> 6 0.9159481 2.742081
summary(fit1)
#> 
#> Call:
#> vglm(formula = cbind(y1, y2) ~ 1, family = N1poisson, data = npdata, 
#>     trace = TRUE)
#> 
#> Coefficients: 
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept):1  0.91595    0.08565   10.70   <2e-16 ***
#> (Intercept):2  0.99633    0.02236   44.56   <2e-16 ***
#> (Intercept):3  1.00872    0.01990   50.68   <2e-16 ***
#> (Intercept):4  0.31285    0.01355   23.08   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Names of linear predictors: mean, loglink(sd), loglink(lambda), 
#> rhobitlink(apar)
#> 
#> Log-likelihood: -8496.518 on 3996 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 10 
#> 
confint(fit1)
#>                   2.5 %    97.5 %
#> (Intercept):1 0.7480867 1.0838094
#> (Intercept):2 0.9525087 1.0401609
#> (Intercept):3 0.9697088 1.0477253
#> (Intercept):4 0.2862852 0.3394167