Turns a categorical variable into a tibble of n-1 dummy-coded values. If x is a factor, the first level is omitted and thus treated as the reference level (to match the behavior of lm() and related functions). If x is not a factor, the first value in x is treated as the reference level. Variable names returned include a common prefix and a cleaned up version of the factor levels (without special characters and in snake_case).

dummy_code(x, prefix = NA, drop_first = TRUE, verbose = interactive())

Arguments

x

The categorical variable to be dummy-coded

prefix

String to be pre-fixed to the new variable names (typically the name of the variable that is dummy-coded). If NULL, variables are just named with the levels. If left as NA, the function will try to extract the name of the variable passed.

drop_first

Should first level be dropped? Defaults to TRUE

verbose

Should message with reference level be displayed?

Examples

dummy_code(iris$Species)
#> # A tibble: 150 × 2
#>    species_versicolor species_virginica
#>    <lgl>              <lgl>            
#>  1 FALSE              FALSE            
#>  2 FALSE              FALSE            
#>  3 FALSE              FALSE            
#>  4 FALSE              FALSE            
#>  5 FALSE              FALSE            
#>  6 FALSE              FALSE            
#>  7 FALSE              FALSE            
#>  8 FALSE              FALSE            
#>  9 FALSE              FALSE            
#> 10 FALSE              FALSE            
#> # ℹ 140 more rows