splitFrame.RdSplits the design matrix into categorical and continuous
predictors. Categorical variables are variables that are factors,
ordered factors, or character.
splitFrame(mf, x = model.matrix(mt, mf),
type = c("f","fi", "fii"))model frame (as returned by model.frame).
(optional) design matrix, defaulting to the derived
model.matrix.
a character string specifying the split type (see details).
Which split type is used can be controlled with the setting
split.type in lmrob.control.
There are three split types. The only differences between the types are how interactions between categorical and continuous variables are handled. The extra types of splitting can be used to avoid Too many singular resamples errors.
Type "f", the default, assigns only the intercept, categorical and
interactions of categorical variables to x1. Interactions of
categorical and continuous variables are assigned to x2.
Type "fi" assigns also interactions between categorical and
continuous variables to x1.
Type "fii" assigns not only interactions between categorical and
continuous variables to x1, but also the (corresponding)
continuous variables themselves.
A list that includes the following components:
design matrix containing only categorical variables
logical vectors of the variables considered categorical in the original design matrix
design matrix containing the continuous variables
Maronna, R. A., and Yohai, V. J. (2000). Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference 89, 197–214.
data(education)
education <- within(education, Region <- factor(Region))
educaCh <- within(education, Region <- as.character(Region))
## no interactions -- same split for all types:
fm1 <- lm(Y ~ Region + X1 + X2 + X3, education)
fmC <- lm(Y ~ Region + X1 + X2 + X3, educaCh )
splt <- splitFrame(fm1$model) ; str(splt)
#> List of 3
#> $ x1 : num [1:50, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:50] "1" "2" "3" "4" ...
#> .. ..$ : chr [1:4] "(Intercept)" "Region2" "Region3" "Region4"
#> $ x1.idx: Named logi [1:7] TRUE TRUE TRUE TRUE FALSE FALSE ...
#> ..- attr(*, "names")= chr [1:7] "(Intercept)" "Region2" "Region3" "Region4" ...
#> $ x2 : num [1:50, 1:3] 508 564 322 846 871 774 856 889 715 753 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:50] "1" "2" "3" "4" ...
#> .. ..$ : chr [1:3] "X1" "X2" "X3"
splC <- splitFrame(fmC$model)
stopifnot(identical(splt, splC))
## with interactions:
fm2 <- lm(Y ~ Region:X1:X2 + X1*X2, education)
s1 <- splitFrame(fm2$model, type="f" )
s2 <- splitFrame(fm2$model, type="fi" )
s3 <- splitFrame(fm2$model, type="fii")
cbind(s1$x1.idx,
s2$x1.idx,
s3$x1.idx)
#> [,1] [,2] [,3]
#> (Intercept) TRUE TRUE TRUE
#> X1 FALSE FALSE FALSE
#> X2 FALSE FALSE FALSE
#> X1:X2 FALSE FALSE TRUE
#> Region2:X1:X2 FALSE TRUE TRUE
#> Region3:X1:X2 FALSE TRUE TRUE
#> Region4:X1:X2 FALSE TRUE TRUE
rbind(p.x1 = c(ncol(s1$x1), ncol(s2$x1), ncol(s3$x1)),
p.x2 = c(ncol(s1$x2), ncol(s2$x2), ncol(s3$x2)))
#> [,1] [,2] [,3]
#> p.x1 1 4 5
#> p.x2 6 3 2