This function creates a scale by calculating the mean of a set of items, and prints and returns descriptives that allow to assess internal consistency and spread. It is primarily based on the psych::alpha function, with more parsimonious output and some added functionality. It also allows to specify a threshold for proration so that missing data can be easily and explicitly dealt with.

make_scale(
  data,
  scale_items,
  scale_name,
  reverse = c("auto", "none", "spec"),
  reverse_items = NULL,
  two_items_reliability = c("spearman_brown", "cron_alpha", "r"),
  r_key = NULL,
  proration_cutoff = 0.4,
  print_hist = TRUE,
  print_desc = TRUE,
  return_list = FALSE,
  harmonize_ranges = NULL
)

Arguments

data

A dataframe

scale_items

Character vector with names of scale items (variables in data)

scale_name

Name of the scale

reverse

Should scale items be reverse coded? One of "auto" - items are reversed if that contributes to scale consistency, "none" - no items reversed, or "spec" - items specific in reverse_items are reversed.

reverse_items

Character vector with names of scale items to be reversed (must be subset of scale_items)

two_items_reliability

How should the reliability of two-item scales be reported? "spearman_brown" is the recommended default, but "cronbachs_alpha" and Pearson's "r" are also supported.

r_key

(optional) Numeric. Set to the possible maximum value of the scale if the whole scale should be reversed, or to -1 to reverse the scale based on the observed maximum.

proration_cutoff

Scales scores are only calculated for cases with at most this share of missing data - see details.

print_hist

Logical. Should histograms for items and resulting scale be printed?

print_desc

Logical. Should descriptives for scales be printed?

return_list

Logical. Should only scale values be returned, or descriptives as well?

harmonize_ranges

Should items that have different ranges be rescaled? Default is not to do it but issue a message to flag this potential issue - set to FALSE to suppress that message. If TRUE, items are rescaled to match the first item given. Alternatively pass a vector (c(min, max)) to specify the desired range.

Value

Depends on return_list argument. Either just the scale values, or a list of scale values and descriptives. If descriptives are returned, check the text element for a convenient summary.

Details

Proration is the easiest way to deal with missing data at the item-level. Here, scale means are calculated for cases that have less than a given share of missing data. According to Wu et al. (2022). a 40% cut-off is defensible, so this is the default used. For more precise estimation, consider item-level multiple imputation, which can be done with make_scale_mi().

Examples

scores <- make_scale(ess_health, scale_items = c("etfruit", "eatveg"), 
                     scale_name = "Healthy eating")
#>     
#> Descriptives for Healthy eating scale:
#> Mean: 3.033  SD: 1.107
#> spearman_brown: 0.66