dummy.code.Rd
Given a variable x with n distinct values, create n new dummy coded variables coded 0/1 for presence (1) or absence (0) of each variable. A typical application would be to create dummy coded college majors from a vector of college majors. Can also combine categories by group. By default, NA values of x are returned as NA (added 10/20/17)
dummy.code(x,group=NULL,na.rm=TRUE,top=NULL,min=NULL)
When coding demographic information, it is typical to create one variable with multiple categorical values (e.g., ethnicity, college major, occupation). dummy.code
will convert these categories into n distinct dummy coded variables.
If there are many possible values (e.g., country in the SAPA data set) then specifying top will assign dummy codes to just a subset of the data.
If using dummy coded variables as predictors, remember to use n-1 variables.
If group is specified, then all values of x that are in group are given the value of 1, otherwise, 0. (Useful for combining a range of science majors into STEM or not. The example forms a dummy code of any smoking at all.)
A matrix of dummy coded variables
new <- dummy.code(sat.act$education)
new.sat <- data.frame(new,sat.act)
round(cor(new.sat,use="pairwise"),2)
#> X3 X5 X4 X0 X1 X2 gender education age ACT
#> X3 1.00 -0.40 -0.40 -0.24 -0.21 -0.21 0.10 -0.09 -0.39 -0.04
#> X5 -0.40 1.00 -0.25 -0.15 -0.13 -0.13 0.03 0.65 0.49 0.11
#> X4 -0.40 -0.25 1.00 -0.15 -0.13 -0.13 -0.02 0.29 0.24 0.07
#> X0 -0.24 -0.15 -0.15 1.00 -0.08 -0.08 -0.08 -0.66 -0.27 -0.07
#> X1 -0.21 -0.13 -0.13 -0.08 1.00 -0.07 -0.05 -0.40 -0.17 -0.06
#> X2 -0.21 -0.13 -0.13 -0.08 -0.07 1.00 -0.09 -0.21 0.05 -0.08
#> gender 0.10 0.03 -0.02 -0.08 -0.05 -0.09 1.00 0.09 -0.02 -0.04
#> education -0.09 0.65 0.29 -0.66 -0.40 -0.21 0.09 1.00 0.55 0.15
#> age -0.39 0.49 0.24 -0.27 -0.17 0.05 -0.02 0.55 1.00 0.11
#> ACT -0.04 0.11 0.07 -0.07 -0.06 -0.08 -0.04 0.15 0.11 1.00
#> SATV 0.00 0.04 0.02 0.01 -0.03 -0.08 -0.02 0.05 -0.04 0.56
#> SATQ -0.03 0.06 0.01 0.03 -0.01 -0.07 -0.17 0.03 -0.03 0.59
#> SATV SATQ
#> X3 0.00 -0.03
#> X5 0.04 0.06
#> X4 0.02 0.01
#> X0 0.01 0.03
#> X1 -0.03 -0.01
#> X2 -0.08 -0.07
#> gender -0.02 -0.17
#> education 0.05 0.03
#> age -0.04 -0.03
#> ACT 0.56 0.59
#> SATV 1.00 0.64
#> SATQ 0.64 1.00
#dum.smoke <- dummy.code(spi$smoke,group=2:9)
#table(dum.smoke,spi$smoke)
#dum.age <- dummy.code(round(spi$age/5)*5,top=5) #the most frequent five year blocks