dirichlet.RdFits a Dirichlet distribution to a matrix of compositions.
dirichlet(link = "loglink", parallel = FALSE, zero = NULL,
imethod = 1)Link function applied to each of the \(M\) (positive) shape
parameters \(\alpha_j\).
See Links for more choices.
The default gives
\(\eta_j=\log(\alpha_j)\).
See CommonVGAMffArguments for more information.
In this help file the response is assumed to be a \(M\)-column matrix with positive values and whose rows each sum to unity. Such data can be thought of as compositional data. There are \(M\) linear/additive predictors \(\eta_j\).
The Dirichlet distribution is commonly used to model compositional data, including applications in genetics. Suppose \((Y_1,\ldots,Y_{M})^T\) is the response. Then it has a Dirichlet distribution if \((Y_1,\ldots,Y_{M-1})^T\) has density $$\frac{\Gamma(\alpha_{+})} {\prod_{j=1}^{M} \Gamma(\alpha_{j})} \prod_{j=1}^{M} y_j^{\alpha_{j} -1}$$ where \(\alpha_+=\alpha_1+\cdots+ \alpha_M\), \(\alpha_j > 0\), and the density is defined on the unit simplex $$\Delta_{M} = \left\{ (y_1,\ldots,y_{M})^T : y_1 > 0, \ldots, y_{M} > 0, \sum_{j=1}^{M} y_j = 1 \right\}. $$ One has \(E(Y_j) = \alpha_j / \alpha_{+}\), which are returned as the fitted values. For this distribution Fisher scoring corresponds to Newton-Raphson.
The Dirichlet distribution can be motivated by considering the random variables \((G_1,\ldots,G_{M})^T\) which are each independent and identically distributed as a gamma distribution with density \(f(g_j)=g_j^{\alpha_j - 1} e^{-g_j} / \Gamma(\alpha_j)\). Then the Dirichlet distribution arises when \(Y_j=G_j / (G_1 + \cdots + G_M)\).
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm,
rrvglm
and vgam.
When fitted, the fitted.values slot of the object
contains the \(M\)-column matrix of means.
Lange, K. (2002). Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. New York: Springer-Verlag.
Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.
The response should be a matrix of positive values whose rows
each sum to unity. Similar to this is count data, where probably
a multinomial logit model (multinomial) may be
appropriate. Another similar distribution to the Dirichlet
is the Dirichlet-multinomial (see dirmultinomial).
ddata <- data.frame(rdiric(1000,
shape = exp(c(y1 = -1, y2 = 1, y3 = 0))))
fit <- vglm(cbind(y1, y2, y3) ~ 1, dirichlet,
data = ddata, trace = TRUE, crit = "coef")
#> Iteration 1: coefficients =
#> -1.78546549, 0.25678632, -0.76180660
#> Iteration 2: coefficients =
#> -1.25827320, 0.77949760, -0.25947647
#> Iteration 3: coefficients =
#> -1.020560060, 1.019505846, -0.025812238
#> Iteration 4: coefficients =
#> -0.980680649, 1.059723820, 0.014129391
#> Iteration 5: coefficients =
#> -0.979710286, 1.060700422, 0.015110409
#> Iteration 6: coefficients =
#> -0.979709728, 1.060700985, 0.015110978
#> Iteration 7: coefficients =
#> -0.979709728, 1.060700985, 0.015110978
Coef(fit)
#> shape1 shape2 shape3
#> 0.3754201 2.8883950 1.0152257
coef(fit, matrix = TRUE)
#> loglink(shape1) loglink(shape2) loglink(shape3)
#> (Intercept) -0.9797097 1.060701 0.01511098
head(fitted(fit))
#> y1 y2 y3
#> 1 0.08773463 0.6750099 0.2372554
#> 2 0.08773463 0.6750099 0.2372554
#> 3 0.08773463 0.6750099 0.2372554
#> 4 0.08773463 0.6750099 0.2372554
#> 5 0.08773463 0.6750099 0.2372554
#> 6 0.08773463 0.6750099 0.2372554