zipf.RdEstimates the parameter of the Zipf distribution.
zipf(N = NULL, lshape = "loglink", ishape = NULL)Number of elements, an integer satisfying 1 < N < Inf.
The default is to use the maximum value of the response.
If given, N must be no less that the largest response value.
If N = Inf and \(s>1\) then this is the zeta
distribution (use zetaff instead).
Parameter link function applied to the (positive) shape parameter \(s\).
See Links for more choices.
Optional initial value for the parameter \(s\). The default is to choose an initial value internally. If converge failure occurs use this argument to input a value.
The probability function for a response \(Y\) is $$P(Y=y) = y^{-s} / \sum_{i=1}^N i^{-s},\ \ s>0,\ \ y=1,2,\ldots,N,$$ where \(s\) is the exponent characterizing the distribution. The mean of \(Y\), which are returned as the fitted values, is \(\mu = H_{N,s-1} / H_{N,s}\) where \(H_{n,m}= \sum_{i=1}^n i^{-m}\) is the \(n\)th generalized harmonic number.
Zipf's law is an experimental law which is often applied
to the study of the frequency of words in a corpus of
natural language utterances. It states that the frequency
of any word is inversely proportional to its rank in the
frequency table. For example, "the" and "of"
are first two most common words, and Zipf's law states
that "the" is twice as common as "of".
Many other natural phenomena conform to Zipf's law.
An object of class "vglmff" (see vglmff-class).
The object is used by modelling functions such as
vglm and vgam.
pp.526– of Chapter 11 of Johnson N. L., Kemp, A. W. and Kotz S. (2005). Univariate Discrete Distributions, 3rd edition, Hoboken, New Jersey, USA: Wiley.
Upon convergence, the N is stored as @misc$N.
zdata <- data.frame(y = 1:5, ofreq = c(63, 14, 5, 1, 2))
zfit <- vglm(y ~ 1, zipf, data = zdata, trace = TRUE, weight = ofreq)
#> Iteration 1: loglikelihood = -70.96903
#> Iteration 2: loglikelihood = -70.96903
zfit <- vglm(y ~ 1, zipf(lshape = "identitylink", ishape = 3.4), data = zdata,
trace = TRUE, weight = ofreq, crit = "coef")
#> Iteration 1: coefficients = 1.733347
#> Iteration 2: coefficients = 2.2463516
#> Iteration 3: coefficients = 2.3406871
#> Iteration 4: coefficients = 2.3439
#> Iteration 5: coefficients = 2.3439037
#> Iteration 6: coefficients = 2.3439037
zfit@misc$N
#> [1] 5
(shape.hat <- Coef(zfit))
#> shape
#> 2.343904
with(zdata, weighted.mean(y, ofreq))
#> [1] 1.411765
fitted(zfit, matrix = FALSE)
#> [,1]
#> 1 1.417752
#> 2 1.417752
#> 3 1.417752
#> 4 1.417752
#> 5 1.417752