cut()
and similar functions can cut continuous variables by quantile;
other helper functions exist to cut variables into groups of the same size
or width. This function cuts a continuous variable into given proportions.
cut_p(x, p, ties.method = "random", fct_levels = NULL, verbose = TRUE)
A numeric variable that is to be cut into categories
The proportion of cases to be allocated to each category, in ascending order. Should add up to one, otherwise, it will be scaled accordingly
Currently accepts only "random" - could be expanded in the future, though it is unclear what a better method would be
Character vector with names for levels. If it is NULL, the groups will be labeled with their number and the cut-points employed.
Should boundaries of groups be reported as message?
Factor variable with x cut into length(p) categories in given proportions
Ties within the continuous variable are allocated randomly - so this function should not be used if there are many ties. The number of observations per group is rounded up for even-numbered levels (second, fourth, etc) and rounded down for others (expect for the last level that is used to balance). For large numbers of observations, the distribution will be very close to what is desired, for very small numbers of observations, it should be checked.
cut_p(iris$Sepal.Length, p = c(.25, .50, .25), fct_levels = c("short", "middling", "long"))
#> Factor levels were assigned as follows:
#> Group 1 (4.3 to 5.1): short
#> Group 2 (5.1 to 6.4): middling
#> Group 3 (6.4 to 7.9): long
#> [1] short short short short short middling short short
#> [9] short short middling short short short middling middling
#> [17] middling short middling short middling middling short short
#> [25] short short short middling middling short short middling
#> [33] middling middling short short middling short short middling
#> [41] short short short short short short short short
#> [49] middling short long middling long middling long middling
#> [57] middling short long middling short middling middling middling
#> [65] middling long middling middling middling middling middling middling
#> [73] middling middling long long long long middling middling
#> [81] middling middling middling middling middling middling long middling
#> [89] middling middling middling middling middling short middling middling
#> [97] middling middling middling middling middling middling long middling
#> [105] long long short long long long long middling
#> [113] long middling middling long long long long middling
#> [121] long middling long middling long long middling middling
#> [129] middling long long long middling middling middling long
#> [137] middling middling middling long long long middling long
#> [145] long long middling long middling middling
#> Levels: short middling long