Find the Winsorized scores, means, sds or variances for a vector, matrix, or data.frame

Among the robust estimates of central tendency are trimmed means and Winsorized means. This function finds the Winsorized scores. The top and bottom trim values are given values of the trimmed and 1- trimmed quantiles. Then means, sds, and variances are found.

winsor(x, trim = 0.2, na.rm = TRUE)
winsor.mean(x, trim = 0.2, na.rm = TRUE)
winsor.means(x, trim = 0.2, na.rm = TRUE)  
winsor.sd(x, trim = 0.2, na.rm = TRUE)  
winsor.var(x, trim = 0.2, na.rm = TRUE)

Arguments

x: A data vector, matrix or data frame
trim: Percentage of data to move from the top and bottom of the distributions
na.rm: Missing data are removed

Details

Among the many robust estimates of central tendency, some recommend the Winsorized mean. Rather than just dropping the top and bottom trim percent, these extreme values are replaced with values at the trim and 1- trim quantiles.

Value

A scalar or vector of winsorized scores or winsorized means, sds, or variances (depending upon the call).

References

Wilcox, Rand R. (2005) Introduction to robust estimation and hypothesis testing. Elsevier/Academic Press. Amsterdam ; Boston.

Author

William Revelle with modifications suggested by Joe Paxton and a further correction added (January, 2009) to preserve the original order for the winsor case.

Examples

data(sat.act)
winsor.means(sat.act) #compare with the means of the winsorized scores
#>     gender  education        age        ACT       SATV       SATQ 
#>   1.647143   3.391429  23.954286  28.957143 615.570000 614.521106 
y <- winsor(sat.act)
describe(y)
#>           vars   n   mean    sd median trimmed    mad   min max range  skew
#> gender       1 700   1.65  0.48      2    1.68   0.00   1.0   2   1.0 -0.61
#> education    2 700   3.39  1.03      3    3.36   1.48   2.0   5   3.0  0.27
#> age          3 700  23.95  5.11     22   23.57   4.45  19.0  32  13.0  0.56
#> ACT          4 700  28.96  3.18     29   28.97   4.45  24.8  33   8.2 -0.06
#> SATV         5 700 615.57 72.79    620  618.21 118.61 510.0 700 190.0 -0.24
#> SATQ         6 687 614.52 80.88    620  616.87 118.61 500.0 710 210.0 -0.24
#>           kurtosis   se
#> gender       -1.62 0.02
#> education    -1.07 0.04
#> age          -1.30 0.19
#> ACT          -1.56 0.12
#> SATV         -1.43 2.75
#> SATQ         -1.47 3.09
xy <- data.frame(sat.act,y)
#pairs.panels(xy) #to see the effect of winsorizing 
x <- matrix(1:100,ncol=5)
winsor(x)
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]  4.8 24.8 44.8 64.8 84.8
#>  [2,]  4.8 24.8 44.8 64.8 84.8
#>  [3,]  4.8 24.8 44.8 64.8 84.8
#>  [4,]  4.8 24.8 44.8 64.8 84.8
#>  [5,]  5.0 25.0 45.0 65.0 85.0
#>  [6,]  6.0 26.0 46.0 66.0 86.0
#>  [7,]  7.0 27.0 47.0 67.0 87.0
#>  [8,]  8.0 28.0 48.0 68.0 88.0
#>  [9,]  9.0 29.0 49.0 69.0 89.0
#> [10,] 10.0 30.0 50.0 70.0 90.0
#> [11,] 11.0 31.0 51.0 71.0 91.0
#> [12,] 12.0 32.0 52.0 72.0 92.0
#> [13,] 13.0 33.0 53.0 73.0 93.0
#> [14,] 14.0 34.0 54.0 74.0 94.0
#> [15,] 15.0 35.0 55.0 75.0 95.0
#> [16,] 16.0 36.0 56.0 76.0 96.0
#> [17,] 16.2 36.2 56.2 76.2 96.2
#> [18,] 16.2 36.2 56.2 76.2 96.2
#> [19,] 16.2 36.2 56.2 76.2 96.2
#> [20,] 16.2 36.2 56.2 76.2 96.2
winsor.means(x)
#> [1] 10.5 30.5 50.5 70.5 90.5
y <- 1:11
winsor(y,trim=.5)
#>  [1] 6 6 6 6 6 6 6 6 6 6 6