na.roughfix.RdImpute Missing Values by median/mode.
na.roughfix(object, ...)A completed data matrix or data frame. For numeric variables,
NAs are replaced with column medians. For factor variables,
NAs are replaced with the most frequent levels (breaking ties
at random). If object contains no NAs, it is returned
unaltered.
This is used as a starting point for imputing missing values by random forest.
data(iris)
iris.na <- iris
set.seed(111)
## artificially drop some data values.
for (i in 1:4) iris.na[sample(150, sample(20, 1)), i] <- NA
iris.roughfix <- na.roughfix(iris.na)
iris.narf <- randomForest(Species ~ ., iris.na, na.action=na.roughfix)
print(iris.narf)
#>
#> Call:
#> randomForest(formula = Species ~ ., data = iris.na, na.action = na.roughfix)
#> Type of random forest: classification
#> Number of trees: 500
#> No. of variables tried at each split: 2
#>
#> OOB estimate of error rate: 4.67%
#> Confusion matrix:
#> setosa versicolor virginica class.error
#> setosa 50 0 0 0.00
#> versicolor 0 46 4 0.08
#> virginica 0 3 47 0.06