Normal scores transformation
blom.RdNormal scores transformation (Inverse normal transformation) by Elfving, Blom, van der Waerden, Tukey, and rankit methods, as well as z score transformation (standardization) and scaling to a range (normalization).
Usage
blom(
x,
method = "general",
alpha = pi/8,
complete = FALSE,
na.last = "keep",
na.rm = TRUE,
adjustN = TRUE,
min = 1,
max = 10,
...
)Arguments
- x
A vector of numeric values.
- method
Any one
"general"(the default),"blom",vdw,"tukey","elfving","rankit",zscore, orscale.- alpha
A value used in the
"general"method. If alpha=pi/8 (the default), the"general"method reduces to the"elfving"method. If alpha=3/8, the"general"method reduces to the"blom"method. If alpha=1/2, the"general"method reduces to the"rankit"method. If alpha=1/3, the"general"method reduces to the"tukey"method. If alpha=0, the"general"method reduces to the"vdw"method.- complete
If
TRUE,NAvalues are removed before transformation. The default isFALSE.- na.last
Passed to
rankin the normal scores methods. See the documentation for therankfunction. The default is"keep".- na.rm
Used in the
"zscore"and"scale"methods. Passed tomean,min, andmaxfunctions in those methods. The default isTRUE.- adjustN
If
TRUE, the default, the normal scores methods use only non-NAvalues to determine the sample size,N. This seems to work well under default conditions whereNAvalues are retained, even if there are a high percentage ofNAvalues.- min
For the
"scale"method, the minimum value of the transformed values.- max
For the
"scale"method, the maximum value of the transformed values.- ...
additional arguments passed to
rank.
Details
By default, NA values are retained in the output.
This behavior can be changed with the na.rm argument
for "zscore" and "scale" methods, or
with na.last for the normal scores methods.
Or NA values can be removed from the input with
complete=TRUE.
For normal scores methods, if there are NA values
or tied values,
it is helpful to look up
the documentation for rank.
In general, for normal scores methods, either of the arguments
method or alpha can be used.
With the current algorithms, there is no need to use both.
Normal scores transformation will return a normal distribution with a mean of 0 and a standard deviation of 1.
The "scale" method coverts values to the range specified
in max and min without transforming the distribution
of values. By default, the "scale" method converts values
to a 1 to 10 range.
Using the "scale" method with
min = 0 and max = 1 is
sometimes called "normalization".
The "zscore" method converts values by the usual method
for z scores: (x - mean(x)) / sd(x). The transformed
values with have a mean of 0 and a standard deviation of
1 but won't be coerced into a normal distribution.
Sometimes this method is called "standardization".
Note
It's possible that Gustav Elfving didn't recommend the
formula used in this function for the Elfving method.
I would like thank Terence Cooke
at the University of Exeter for their
diligence at trying to track down a reference for this formula.
References
Conover, 1995, Practical Nonparametric Statistics, 3rd.
Solomon & Sawilowsky, 2009, Impact of rank-based normalizing transformations on the accuracy of test scores.
Beasley and Erickson, 2009, Rank-based inverse normal transformations are increasingly used, but are they merited?
Author
Salvatore Mangiafico, mangiafico@njaes.rutgers.edu
Examples
set.seed(12345)
A = rlnorm(100)
if (FALSE) hist(A) # \dontrun{}
### Convert data to normal scores by Elfving method
B = blom(A)
if (FALSE) hist(B) # \dontrun{}
### Convert data to z scores
C = blom(A, method="zscore")
if (FALSE) hist(C) # \dontrun{}
### Convert data to a scale of 1 to 10
D = blom(A, method="scale")
if (FALSE) hist(D) # \dontrun{}
### Data from Sokal and Rohlf, 1995,
### Biometry: The Principles and Practice of Statistics
### in Biological Research
Value = c(709,679,699,657,594,677,592,538,476,508,505,539)
Sex = c(rep("Male",3), rep("Female",3), rep("Male",3), rep("Female",3))
Fat = c(rep("Fresh", 6), rep("Rancid", 6))
ValueBlom = blom(Value)
Sokal = data.frame(ValueBlom, Sex, Fat)
model = lm(ValueBlom ~ Sex * Fat, data=Sokal)
anova(model)
#> Analysis of Variance Table
#>
#> Response: ValueBlom
#> Df Sum Sq Mean Sq F value Pr(>F)
#> Sex 1 0.5399 0.5399 2.0932 0.1859728
#> Fat 1 6.7936 6.7936 26.3374 0.0008939 ***
#> Sex:Fat 1 0.5938 0.5938 2.3022 0.1676690
#> Residuals 8 2.0636 0.2579
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
if (FALSE) { # \dontrun{
hist(residuals(model))
plot(predict(model), residuals(model))
} # }