This function creates a scale based on multiple imputation at item level and returns pooled descriptive statistics (including Cronbach's alpha). Note that this function only supports Cronbach's alpha, including for two-item scales.
make_scale_mi(
data,
scale_items,
scale_name,
proration_cutoff = 0,
seed = NULL,
alpha_ci = FALSE,
boot = 5000,
parallel = TRUE,
...
)
The approach to pooling Cronbach's alpha is taken from Dion Groothof on StackOverflow. The development of the function was motivated by Gottschall, West & Enders (2012) who showed that multiple imputation at item level results in much higher statistical power than multiple imputation at scale level.
A dataframe containing multiple imputations, distinguished by a .imp
variable. Typically the output from mice::complete(mids, "long")
.
Character vector with names of scale items (variables in data)
Name of the scale
Applies only to raw data (.imp == 0) in data. Scales scores are only calculated for cases with at most this share of missing data.
For pooling, the variance of Cronbach's alpha is bootstrapped. Set a seed to make this reproducible.
Should a confidence interval for Cronbach's alpha be returned? Note that this requires bootstrapping and thus makes the function much slower. TRUE corresponds to a 95% confidence interval, other widths can be specified as fractions, e.g., .9
For pooling, the variance of Cronbach's alpha is bootstrapped. Set number of bootstrap resamples here.
Should bootstrapping be conducted in parallel (using parallel
-package)? Pass a number to select the number of cores - otherwise, the function will use all but one core.
Arguments passed on to make_scale
reverse
Should scale items be reverse coded? One of "auto" - items are
reversed if that contributes to scale consistency, "none" - no items reversed,
or "spec" - items specific in reverse_items
are reversed.
reverse_items
Character vector with names of scale items to be reversed (must be subset of scale_items)
r_key
(optional) Numeric. Set to the possible maximum value of the scale if the whole scale should be reversed, or to -1 to reverse the scale based on the observed maximum.
print_desc
Logical. Should descriptives for scales be printed?
return_list
Logical. Should only scale values be returned, or descriptives as well?
harmonize_ranges
Should items that have different ranges be rescaled? Default is not to do it but issue a message to flag this potential issue - set to FALSE to suppress that message. If TRUE, items are rescaled to match the first item given. Alternatively pass a vector (c(min, max)) to specify the desired range.
Scale scores are returned for the raw data as well (if it is included in data
).
Descriptive statistics and reliability estimates are only based on the imputed datasets.
library(dplyr)
library(mice)
# Create Dataset with missing data
ess_health <- ess_health %>% sample_n(500) %>% select(etfruit, eatveg , dosprt, health)
add_missing <- function(x) {x[!rbinom(length(x), 1, .9)] <- NA; x}
ess_health <- ess_health %>% mutate(across(everything(), add_missing))
# Impute data
ess_health_mi <- mice(ess_health, printFlag = FALSE)
ess_health_mi <- complete(ess_health_mi, "long")
scale <- make_scale_mi(ess_health_mi, c("etfruit", "eatveg"), "healthy")
#>
#> Descriptives for healthy scale:
#> Mean: 2.959 SD: 1.114
#> Cronbach's alpha: 0.72