cut() and similar functions can cut continuous variables by quantile; other helper functions exist to cut variables into groups of the same size or width. This function cuts a continuous variable into given proportions.

cut_p(
  x,
  p,
  ties.method = c("random", "in_order"),
  fct_levels = NULL,
  verbose = TRUE
)

Arguments

x

A numeric variable that is to be cut into categories

p

The proportion of cases to be allocated to each category, in ascending order. Should add up to one, otherwise, it will be scaled accordingly

ties.method

Method for handling ties when ranking. Accepts "random" (ties allocated randomly) or "in_order" (ties allocated in order of appearance).

fct_levels

Character vector with names for levels. If it is NULL, the groups will be labeled with their number and the cut-points employed.

verbose

Should boundaries of groups be reported as message?

Value

Factor variable with x cut into length(p) categories in given proportions

Details

When there are ties in the data that span group boundaries, this function provides two methods for allocation: "random" (ties are allocated randomly) and "in_order" (ties are allocated in the order they appear). The number of observations per group is rounded up for even-numbered levels (second, fourth, etc) and rounded down for others (except for the last level that is used to balance). For large numbers of observations, the distribution will be very close to what is desired, for very small numbers of observations, it should be checked.

Examples

cut_p(iris$Sepal.Length, p = c(.25, .50, .25), fct_levels = c("short", "middling", "long"))
#> Factor levels assigned as: Group 1 (4.3 to 5.1): short Group 2 (5.1 to 6.4):
#> middling Group 3 (6.4 to 7.9): long
#>   [1] short    short    short    short    short    middling short    short   
#>   [9] short    short    middling short    short    short    middling middling
#>  [17] middling middling middling short    middling short    short    short   
#>  [25] short    short    short    middling middling short    short    middling
#>  [33] middling middling short    short    middling short    short    middling
#>  [41] short    short    short    short    middling short    middling short   
#>  [49] middling short    long     middling long     middling long     middling
#>  [57] middling short    long     middling short    middling middling middling
#>  [65] middling long     middling middling middling middling middling middling
#>  [73] middling middling middling long     long     long     middling middling
#>  [81] middling middling middling middling middling middling long     middling
#>  [89] middling middling middling middling middling short    middling middling
#>  [97] middling middling short    middling middling middling long     middling
#> [105] long     long     short    long     long     long     long     long    
#> [113] long     middling middling middling long     long     long     middling
#> [121] long     middling long     middling long     long     middling middling
#> [129] long     long     long     long     long     middling middling long    
#> [137] middling middling middling long     long     long     middling long    
#> [145] long     long     middling long     middling middling
#> Levels: short middling long