dirmul.old.RdFits a Dirichlet-multinomial distribution to a matrix of non-negative integers.
dirmul.old(link = "loglink", ialpha = 0.01, parallel = FALSE,
zero = NULL)Link function applied to each of the \(M\) (positive)
shape parameters \(\alpha_j\) for \(j=1,\ldots,M\).
See Links for more choices.
Here, \(M\) is the number of columns of the response matrix.
Numeric vector. Initial values for the
alpha vector. Must be positive.
Recycled to length \(M\).
A logical, or formula specifying which terms have equal/unequal coefficients.
An integer-valued vector specifying which linear/additive
predictors are modelled as intercepts only. The values must
be from the set {1,2,...,\(M\)}.
See CommonVGAMffArguments for more information.
The Dirichlet-multinomial distribution, which is somewhat
similar to a Dirichlet distribution, has probability function
$$P(Y_1=y_1,\ldots,Y_M=y_M) =
{2y_{*} \choose {y_1,\ldots,y_M}}
\frac{\Gamma(\alpha_{+})}{\Gamma(2y_{*}+\alpha_{+})}
\prod_{j=1}^M \frac{\Gamma(y_j+\alpha_{j})}{\Gamma(\alpha_{j})}$$
for \(\alpha_j > 0\),
\(\alpha_+ = \alpha_1 +
\cdots + \alpha_M\),
and \(2y_{*} = y_1 + \cdots + y_M\).
Here, \(a \choose b\) means “\(a\) choose
\(b\)” and
refers to combinations (see choose).
The (posterior) mean is
$$E(Y_j) = (y_j + \alpha_j) / (2y_{*} + \alpha_{+})$$
for \(j=1,\ldots,M\), and these are returned
as the fitted values as a \(M\)-column matrix.
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm,
rrvglm
and vgam.
Lange, K. (2002). Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. New York: Springer-Verlag.
Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.
Paul, S. R., Balasooriya, U. and Banerjee, T. (2005). Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230–236.
Tvedebrink, T. (2010). Overdispersion in allelic counts and \(\theta\)-correction in forensic genetics. Theoretical Population Biology, 78, 200–210.
The response should be a matrix of non-negative values. Convergence seems to slow down if there are zero values. Currently, initial values can be improved upon.
This function is almost defunct and may be withdrawn soon.
Use dirmultinomial instead.
# Data from p.50 of Lange (2002)
alleleCounts <- c(2, 84, 59, 41, 53, 131, 2, 0,
0, 50, 137, 78, 54, 51, 0, 0,
0, 80, 128, 26, 55, 95, 0, 0,
0, 16, 40, 8, 68, 14, 7, 1)
dim(alleleCounts) <- c(8, 4)
alleleCounts <- data.frame(t(alleleCounts))
dimnames(alleleCounts) <- list(c("White","Black","Chicano","Asian"),
paste("Allele", 5:12, sep = ""))
set.seed(123) # @initialize uses random numbers
fit <- vglm(cbind(Allele5,Allele6,Allele7,Allele8,Allele9,
Allele10,Allele11,Allele12) ~ 1, dirmul.old,
trace = TRUE, crit = "c", data = alleleCounts)
#> Iteration 1: coefficients =
#> -3.8910869, -3.6350411, -3.6298790, -3.6403794, -3.6332751, -3.6337880,
#> -3.7360239, -3.9009179
#> Iteration 2: coefficients =
#> -3.4223445, -2.6980446, -2.6795530, -2.7170128, -2.6917385, -2.6935653,
#> -3.0178262, -3.4466109
#> Iteration 3: coefficients =
#> -3.1638955, -1.8107720, -1.7595230, -1.8622977, -1.7934316, -1.7984234,
#> -2.5168153, -3.2009785
#> Iteration 4: coefficients =
#> -2.98347967, -0.98412783, -0.85974357, -1.10397597, -0.94276612, -0.95448708,
#> -2.20846822, -3.02951199
#> Iteration 5: coefficients =
#> -2.80473181, -0.21810317, 0.03623887, -0.44935744, -0.13578181, -0.15814199,
#> -1.97869438, -2.85943102
#> Iteration 6: coefficients =
#> -2.61021035, 0.48663962, 0.87825275, 0.13503442, 0.61101386, 0.58101766,
#> -1.75021500, -2.67546526
#> Iteration 7: coefficients =
#> -2.41891431, 1.07169370, 1.52295278, 0.64663206, 1.21293124, 1.18749143,
#> -1.52499339, -2.49639970
#> Iteration 8: coefficients =
#> -2.28369573, 1.41960674, 1.87829275, 0.97499879, 1.55889636, 1.54374568,
#> -1.36193004, -2.37151446
#> Iteration 9: coefficients =
#> -2.2358175, 1.5253024, 1.9836330, 1.0793649, 1.6622051, 1.6517404,
#> -1.3026311, -2.3279204
#> Iteration 10: coefficients =
#> -2.2314875, 1.5338999, 1.9921631, 1.0879989, 1.6705571, 1.6605351,
#> -1.2971389, -2.3240277
#> Iteration 11: coefficients =
#> -2.2314584, 1.5339538, 1.9922165, 1.0880533, 1.6706093, 1.6605903,
#> -1.2971013, -2.3240019
#> Iteration 12: coefficients =
#> -2.2314584, 1.5339538, 1.9922165, 1.0880533, 1.6706093, 1.6605903,
#> -1.2971013, -2.3240019
(sfit <- summary(fit))
#>
#> Call:
#> vglm(formula = cbind(Allele5, Allele6, Allele7, Allele8, Allele9,
#> Allele10, Allele11, Allele12) ~ 1, family = dirmul.old, data = alleleCounts,
#> trace = TRUE, crit = "c")
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 -2.2315 1.0060 -2.218 0.026546 *
#> (Intercept):2 1.5340 0.4113 3.729 0.000192 ***
#> (Intercept):3 1.9922 0.3978 5.008 5.51e-07 ***
#> (Intercept):4 1.0881 0.4283 2.540 0.011079 *
#> (Intercept):5 1.6706 0.4015 4.161 3.17e-05 ***
#> (Intercept):6 1.6606 0.4124 4.027 5.65e-05 ***
#> (Intercept):7 -1.2971 0.7089 -1.830 0.067284 .
#> (Intercept):8 -2.3240 1.0090 -2.303 0.021264 *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Number of linear predictors: 8
#>
#> Names of linear predictors: loglink(shape1), loglink(shape2), loglink(shape3),
#> loglink(shape4), loglink(shape5), loglink(shape6), loglink(shape7),
#> loglink(shape8)
#>
#> Log-likelihood: -47692.08 on 24 degrees of freedom
#>
#> Number of Fisher scoring iterations: 12
#>
#> No Hauck-Donner effect found in any of the estimates
#>
vcov(sfit)
#> (Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4
#> (Intercept):1 1.01205315 0.04973858 0.05134910 0.04720810
#> (Intercept):2 0.04973858 0.16918837 0.11953160 0.10989208
#> (Intercept):3 0.05134910 0.11953160 0.15826836 0.11345035
#> (Intercept):4 0.04720810 0.10989208 0.11345035 0.18347177
#> (Intercept):5 0.04953016 0.11529742 0.11903071 0.10943159
#> (Intercept):6 0.05101272 0.11874857 0.12259362 0.11270716
#> (Intercept):7 0.02586997 0.06022070 0.06217063 0.05715694
#> (Intercept):8 0.01966148 0.04576844 0.04725041 0.04343995
#> (Intercept):5 (Intercept):6 (Intercept):7 (Intercept):8
#> (Intercept):1 0.04953016 0.05101272 0.02586997 0.01966148
#> (Intercept):2 0.11529742 0.11874857 0.06022070 0.04576844
#> (Intercept):3 0.11903071 0.12259362 0.06217063 0.04725041
#> (Intercept):4 0.10943159 0.11270716 0.05715694 0.04343995
#> (Intercept):5 0.16120336 0.11825097 0.05996835 0.04557666
#> (Intercept):6 0.11825097 0.17004926 0.06176336 0.04694089
#> (Intercept):7 0.05996835 0.06176336 0.50251874 0.02380503
#> (Intercept):8 0.04557666 0.04694089 0.02380503 1.01809210
round(eta2theta(coef(fit),
fit@misc$link,
fit@misc$earg), digits = 2) # not preferred
#> shape1 shape2 shape3 shape4 shape5 shape6 shape7 shape8
#> theta 0.11 4.64 7.33 2.97 5.32 5.26 0.27 0.1
round(Coef(fit), digits = 2) # preferred
#> shape1 shape2 shape3 shape4 shape5 shape6 shape7 shape8
#> 0.11 4.64 7.33 2.97 5.32 5.26 0.27 0.10
round(t(fitted(fit)), digits = 4) # 2nd row of Lange (2002, Table 3.5)
#> White Black Chicano Asian
#> Allele5 0.0053 0.0003 0.0003 0.0006
#> Allele6 0.2227 0.1380 0.2064 0.1147
#> Allele7 0.1667 0.3645 0.3301 0.2630
#> Allele8 0.1105 0.2045 0.0707 0.0609
#> Allele9 0.1465 0.1498 0.1471 0.4073
#> Allele10 0.3424 0.1421 0.2445 0.1070
#> Allele11 0.0057 0.0007 0.0007 0.0404
#> Allele12 0.0002 0.0002 0.0002 0.0061
coef(fit, matrix = TRUE)
#> loglink(shape1) loglink(shape2) loglink(shape3) loglink(shape4)
#> (Intercept) -2.231458 1.533954 1.992217 1.088053
#> loglink(shape5) loglink(shape6) loglink(shape7) loglink(shape8)
#> (Intercept) 1.670609 1.66059 -1.297101 -2.324002
pfit <- vglm(cbind(Allele5,Allele6,Allele7,Allele8,Allele9,
Allele10,Allele11,Allele12) ~ 1,
dirmul.old(parallel = TRUE), trace = TRUE,
data = alleleCounts)
#> Iteration 1: loglikelihood = -45179.1
#> Iteration 2: loglikelihood = -45237.27
#> Taking a modified step....................
round(eta2theta(coef(pfit, matrix = TRUE), pfit@misc$link,
pfit@misc$earg), digits = 2) # 'Right' answer
#> shape1 shape2 shape3 shape4 shape5 shape6 shape7 shape8
#> (Intercept) 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03
round(Coef(pfit), digits = 2) # 'Wrong' due to parallelism constraint
#> (Intercept)
#> -3.66