Create dummy coded variables — dummy.code • psych

Given a variable x with n distinct values, create n new dummy coded variables coded 0/1 for presence (1) or absence (0) of each variable. A typical application would be to create dummy coded college majors from a vector of college majors. Can also combine categories by group. By default, NA values of x are returned as NA (added 10/20/17)

dummy.code(x,group=NULL,na.rm=TRUE,top=NULL,min=NULL)

Arguments

x: A vector to be transformed into dummy codes
group: A vector of categories to be coded as 1, all others coded as 0.
na.rm: If TRUE, return NA for all codes with NA in x
top: If specified, then just dummy code the top values, and make the rest NA
min: If specified, then dummy code all values >= min

Details

When coding demographic information, it is typical to create one variable with multiple categorical values (e.g., ethnicity, college major, occupation). dummy.code will convert these categories into n distinct dummy coded variables.

If there are many possible values (e.g., country in the SAPA data set) then specifying top will assign dummy codes to just a subset of the data.

If using dummy coded variables as predictors, remember to use n-1 variables.

If group is specified, then all values of x that are in group are given the value of 1, otherwise, 0. (Useful for combining a range of science majors into STEM or not. The example forms a dummy code of any smoking at all.)

Value

A matrix of dummy coded variables

Author

William Revelle

Examples

new <- dummy.code(sat.act$education)
new.sat <- data.frame(new,sat.act)
round(cor(new.sat,use="pairwise"),2)
#>              X3    X5    X4    X0    X1    X2 gender education   age   ACT
#> X3         1.00 -0.40 -0.40 -0.24 -0.21 -0.21   0.10     -0.09 -0.39 -0.04
#> X5        -0.40  1.00 -0.25 -0.15 -0.13 -0.13   0.03      0.65  0.49  0.11
#> X4        -0.40 -0.25  1.00 -0.15 -0.13 -0.13  -0.02      0.29  0.24  0.07
#> X0        -0.24 -0.15 -0.15  1.00 -0.08 -0.08  -0.08     -0.66 -0.27 -0.07
#> X1        -0.21 -0.13 -0.13 -0.08  1.00 -0.07  -0.05     -0.40 -0.17 -0.06
#> X2        -0.21 -0.13 -0.13 -0.08 -0.07  1.00  -0.09     -0.21  0.05 -0.08
#> gender     0.10  0.03 -0.02 -0.08 -0.05 -0.09   1.00      0.09 -0.02 -0.04
#> education -0.09  0.65  0.29 -0.66 -0.40 -0.21   0.09      1.00  0.55  0.15
#> age       -0.39  0.49  0.24 -0.27 -0.17  0.05  -0.02      0.55  1.00  0.11
#> ACT       -0.04  0.11  0.07 -0.07 -0.06 -0.08  -0.04      0.15  0.11  1.00
#> SATV       0.00  0.04  0.02  0.01 -0.03 -0.08  -0.02      0.05 -0.04  0.56
#> SATQ      -0.03  0.06  0.01  0.03 -0.01 -0.07  -0.17      0.03 -0.03  0.59
#>            SATV  SATQ
#> X3         0.00 -0.03
#> X5         0.04  0.06
#> X4         0.02  0.01
#> X0         0.01  0.03
#> X1        -0.03 -0.01
#> X2        -0.08 -0.07
#> gender    -0.02 -0.17
#> education  0.05  0.03
#> age       -0.04 -0.03
#> ACT        0.56  0.59
#> SATV       1.00  0.64
#> SATQ       0.64  1.00
#dum.smoke <- dummy.code(spi$smoke,group=2:9)
#table(dum.smoke,spi$smoke)
#dum.age <- dummy.code(round(spi$age/5)*5,top=5)  #the most frequent five year blocks