Create a scale by calculating item mean and returns descriptives

This function creates a scale by calculating the mean of a set of items, and prints and returns descriptives that allow to assess internal consistency and spread. It is primarily based on the psych::alpha function, with more parsimonious output and some added functionality. It also allows to specify a threshold for proration so that missing data can be easily and explicitly dealt with.

make_scale(
  data,
  scale_items,
  scale_name,
  reverse = c("auto", "none", "spec"),
  reverse_items = NULL,
  two_items_reliability = c("spearman_brown", "cron_alpha", "r"),
  r_key = NULL,
  proration_cutoff = 0.4,
  print_hist = TRUE,
  print_desc = TRUE,
  return_list = FALSE,
  harmonize_ranges = NULL
)

Arguments

data: A dataframe
scale_items: Character vector with names of scale items (variables in data)
scale_name: Name of the scale
reverse: Should scale items be reverse coded? One of "auto" - items are reversed if that contributes to scale consistency, "none" - no items reversed, or "spec" - items specific in reverse_items are reversed.
reverse_items: Character vector with names of scale items to be reversed (must be subset of scale_items)
two_items_reliability: How should the reliability of two-item scales be reported? "spearman_brown" is the recommended default, but "cronbachs_alpha" and Pearson's "r" are also supported.
r_key: (optional) Numeric. Set to the possible maximum value of the scale if the whole scale should be reversed, or to -1 to reverse the scale based on the observed maximum.
proration_cutoff: Scales scores are only calculated for cases with at most this share of missing data - see details.
print_hist: Logical. Should histograms for items and resulting scale be printed?
print_desc: Logical. Should descriptives for scales be printed?
return_list: Logical. Should only scale values be returned, or descriptives as well?
harmonize_ranges: Should items that have different ranges be rescaled? Default is not to do it but issue a message to flag this potential issue - set to FALSE to suppress that message. If TRUE, items are rescaled to match the first item given. Alternatively pass a vector (c(min, max)) to specify the desired range.

Value

Depends on return_list argument. Either just the scale values, or a list of scale values and descriptives. If descriptives are returned, check the text element for a convenient summary.

Details

Proration is the easiest way to deal with missing data at the item-level. Here, scale means are calculated for cases that have less than a given share of missing data. According to Wu et al. (2022). a 40% cut-off is defensible, so this is the default used. For more precise estimation, consider item-level multiple imputation, which can be done with make_scale_mi().

Examples

scores <- make_scale(ess_health, scale_items = c("etfruit", "eatveg"), 
                     scale_name = "Healthy eating")
#>     
#> Descriptives for Healthy eating scale:
#> Mean: 3.033  SD: 1.107
#> spearman_brown: 0.66