Safe (generalized) Huber M-Estimator of Location

(Generalized) Huber M-estimator of location with MAD scale, being sensible also when the scale is zero where huber() returns an error.

huberM(x, k = 1.5, weights = NULL, tol = 1e-06,
       mu = if(is.null(weights)) median(x) else wgt.himedian(x, weights),
       s =  if(is.null(weights)) mad(x, center=mu)
      else wgt.himedian(abs(x - mu), weights),
       se = FALSE,
       warn0scale = getOption("verbose"))

Arguments

x: numeric vector.
k: positive factor; the algorithm winsorizes at k standard deviations.
weights: numeric vector of non-negative weights of same length as x, or NULL.
tol: convergence tolerance.
mu: initial location estimator.
s: scale estimator held constant through the iterations.
se: logical indicating if the standard error should be computed and returned (as SE component). Currently only available when weights is NULL.
warn0scale: logical; if true, and s is 0 and length(x) > 1, this will be warned about.

Value

list of location and scale parameters, and number of iterations used.

mu: location estimate
s: the s argument, typically the mad.
it: the number of “Huber iterations” used.

Details

Note that currently, when non-NULL weights are specified, the default for initial location mu and scale s is wgt.himedian, where strictly speaking a weighted “non-hi” median should be used for consistency. Since s is not updated, the results slightly differ, see the examples below.

When se = TRUE, the standard error is computed using the \(\tau\) correction factor but no finite sample correction.

Author

Martin Maechler, building on the MASS code mentioned.

References

Huber, P. J. (1981) Robust Statistics. Wiley.

Examples

huberM(c(1:9, 1000))
#> $mu
#> [1] 5.617749
#> 
#> $s
#> [1] 3.7065
#> 
#> $it
#> [1] 6
#> 
#> $SE
#> [1] NA
#> 
mad   (c(1:9, 1000))
#> [1] 3.7065
mad   (rep(9, 100))
#> [1] 0
huberM(rep(9, 100))
#> $mu
#> [1] 9
#> 
#> $s
#> [1] 0
#> 
#> $it
#> [1] 0
#> 
#> $SE
#> [1] NA
#> 

## When you have "binned" aka replicated observations:
set.seed(7)
x <- c(round(rnorm(1000),1), round(rnorm(50, m=10, sd = 10)))
t.x <- table(x) # -> unique values and multiplicities
x.uniq <- as.numeric(names(t.x)) ## == sort(unique(x))
x.mult <- unname(t.x)
str(Hx  <- huberM(x.uniq, weights = x.mult), digits = 7)
#> List of 4
#>  $ mu: num 0.05013769
#>  $ s : num 0.7
#>  $ it: int 11
#>  $ SE: num NA
str(Hx. <- huberM(x, s = Hx$s, se=TRUE), digits = 7) ## should be ~= Hx
#> List of 4
#>  $ mu: num 0.05013769
#>  $ s : num 0.7
#>  $ it: int 11
#>  $ SE: num 0.03382601
stopifnot(all.equal(Hx[-4], Hx.[-4]))
str(Hx2 <- huberM(x, se=TRUE), digits = 7)## somewhat different, since 's' differs
#> List of 4
#>  $ mu: num 0.05891084
#>  $ s : num 1.03782
#>  $ it: int 7
#>  $ SE: num 0.03379949

## Confirm correctness of std.error :
# \donttest{
system.time(
SS <- replicate(10000, vapply(huberM(rnorm(400), se=TRUE), as.double, 1.))
) # ~ 2.8 seconds (was 12.2 s)
#>    user  system elapsed 
#>   1.704   0.012   1.716 
rbind(mean(SS["SE",]), sd(SS["mu",]))# both ~ 0.0508
#>            [,1]
#> [1,] 0.05079230
#> [2,] 0.05084461
stopifnot(all.equal(mean(SS["SE",]),
                    sd ( SS["mu",]), tolerance= 0.002))
# }