fitdistr.Rd
Maximum-likelihood fitting of univariate distributions, allowing parameters to be held fixed if desired.
fitdistr(x, densfun, start, ...)
A numeric vector of length at least one containing only finite values.
Either a character string or a function returning a density evaluated at its first argument.
Distributions "beta"
, "cauchy"
, "chi-squared"
,
"exponential"
, "gamma"
, "geometric"
,
"log-normal"
, "lognormal"
, "logistic"
,
"negative binomial"
, "normal"
, "Poisson"
,
"t"
and "weibull"
are recognised, case being ignored.
A named list giving the parameters to be optimized with initial values. This can be omitted for some of the named distributions and must be for others (see Details).
Additional parameters, either for densfun
or for optim
.
In particular, it can be used to specify bounds via lower
or
upper
or both. If arguments of densfun
(or the density
function corresponding to a character-string specification) are included
they will be held fixed.
An object of class "fitdistr"
, a list with four components,
the parameter estimates,
the estimated standard errors,
the estimated variance-covariance matrix, and
the log-likelihood.
For the Normal, log-Normal, geometric, exponential and Poisson
distributions the closed-form MLEs (and exact standard errors) are
used, and start
should not be supplied.
For all other distributions, direct optimization of the log-likelihood
is performed using optim
. The estimated standard
errors are taken from the observed information matrix, calculated by a
numerical approximation. For one-dimensional problems the Nelder-Mead
method is used and for multi-dimensional problems the BFGS method,
unless arguments named lower
or upper
are supplied (when
L-BFGS-B
is used) or method
is supplied explicitly.
For the "t"
named distribution the density is taken to be the
location-scale family with location m
and scale s
.
For the following named distributions, reasonable starting values will
be computed if start
is omitted or only partially specified:
"cauchy"
, "gamma"
, "logistic"
,
"negative binomial"
(parametrized by mu
and
size
), "t"
and "weibull"
. Note that these
starting values may not be good enough if the fit is poor: in
particular they are not resistant to outliers unless the fitted
distribution is long-tailed.
There are print
, coef
, vcov
and logLik
methods for class "fitdistr"
.
Numerical optimization cannot work miracles: please note the comments
in optim
on scaling data. If the fitted parameters are
far away from one, consider re-fitting specifying the control
parameter parscale
.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
## avoid spurious accuracy
op <- options(digits = 3)
set.seed(123)
x <- rgamma(100, shape = 5, rate = 0.1)
fitdistr(x, "gamma")
#> shape rate
#> 6.4870 0.1365
#> (0.8946) (0.0196)
## now do this directly with more control.
fitdistr(x, dgamma, list(shape = 1, rate = 0.1), lower = 0.001)
#> shape rate
#> 6.4869 0.1365
#> (0.8944) (0.0196)
set.seed(123)
x2 <- rt(250, df = 9)
fitdistr(x2, "t", df = 9)
#> m s
#> -0.0107 1.0441
#> ( 0.0722) ( 0.0543)
## allow df to vary: not a very good idea!
fitdistr(x2, "t")
#> Warning: NaNs produced
#> m s df
#> -0.00965 1.00617 6.62729
#> ( 0.07147) ( 0.07707) ( 2.71033)
## now do fixed-df fit directly with more control.
mydt <- function(x, m, s, df) dt((x-m)/s, df)/s
fitdistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0))
#> m s
#> -0.0107 1.0441
#> ( 0.0722) ( 0.0543)
set.seed(123)
x3 <- rweibull(100, shape = 4, scale = 100)
fitdistr(x3, "weibull")
#> shape scale
#> 4.080 99.984
#> ( 0.313) ( 2.582)
set.seed(123)
x4 <- rnegbin(500, mu = 5, theta = 4)
fitdistr(x4, "Negative Binomial")
#> size mu
#> 4.216 4.945
#> (0.504) (0.147)
options(op)