Topic 10 Survey research - testing scales

A lot of social science research is done with surveys. In this video, I am talking about some examples and challenges, as well as some ways of asking good questions. I am also talking about how to create scales that use multiple items to measure one construct, and how to test their internal consistency.

10.1 Creating scales

Often, surveys use multiple items to measure the same construct. This is particularly common when participants might be either unable or unwilling to answer a direct question (e.g., how racist are you?). In such cases, asking several related questions can help to get an overall measure of the construct, while reducing measurement error, because the errors on different items will cancel each other out (to some extent).

Once you have data on your items, the most common way of creating a scale is just to calculate the mean of the items. If some of them were worded in the opposite direction (known as reverse-coding, and often recommended to increase data quality), those items need to be reversed first, most easily by subtracting their value from 1 + the scale maximum.1

10.1.1 Testing internal consistency

We might expect that our items all measure the same construct (e.g., prejudice against foreigners). However, we should test whether that is the case. The most common measure to use is Cronbach’s \(\alpha\). It ranges from 0 to 1, and can be roughly understood as the intercorrelation of all items. Here is how to calculate it in R, using some data from the European Social Survey 2014.

pacman::p_load(psych, tidyverse)
ess <- read_rds(url("http://empower-training.de/Gold/round7b.RDS"))
ess %>% select(imtcjob, imbleco, imwbcrm) %>% psych::alpha()
## Number of categories should be increased  in order to count frequencies.
## 
## Reliability analysis   
## Call: psych::alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean  sd median_r
##       0.71      0.71    0.63      0.45 2.4 0.0025  4.4 1.8     0.45
## 
##  lower alpha upper     95% confidence boundaries
## 0.7 0.71 0.71 
## 
##  Reliability if an item is dropped:
##         raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## imtcjob      0.62      0.62    0.45      0.45 1.6   0.0038    NA  0.45
## imbleco      0.53      0.53    0.36      0.36 1.1   0.0047    NA  0.36
## imwbcrm      0.69      0.69    0.53      0.53 2.2   0.0031    NA  0.53
## 
##  Item statistics 
##             n raw.r std.r r.cor r.drop mean  sd
## imtcjob 38737  0.82  0.79  0.63   0.53  4.8 2.3
## imbleco 37872  0.83  0.83  0.71   0.60  4.5 2.2
## imwbcrm 38025  0.75  0.76  0.55   0.46  3.7 2.0

The key part of the output is the standardised alpha value (std.alpha). This indicates how internally consistent the scale is, and is usually interpreted in line with established cut-offs (DeVellis 2016):

  • \(\alpha\) < .6: poor,
  • 0.6 ≤ \(\alpha\) ≤ .7: questionable,
  • 0.7 ≤ \(\alpha\) ≤ .8: acceptable,
  • 0.8 ≤ \(\alpha\) ≤ .9: good,
  • 0.9 ≤ \(\alpha\): excellent.

However, it should be noted that \(\alpha\) increases with the length of the scale, so that a a short scale (e.g., 3 items) might well be of sufficient quality with \(\alpha\) = .65, while ‘excellent’ levels of fit might require long scales that are costly or tiresome to administer.

The section on Reliability if an item is dropped can be helpful when you find low internal consistency; in those cases it might be necessary to exclude items from the scale that reduce its consistency. Decisions about droppping items should not be made just on these numbers as that would risk overfitting your scale to the particular sample at hand. Instead, always also consider the content of the items.

10.1.2 Calculating scale scores

The alpha() function from the psych package automatically calculates scores for each participant on the scale. To use them, you need to save the object that the function returns and then access the scores element.

scale_data <- ess %>% select(imtcjob, imbleco, imwbcrm) %>% psych::alpha()
#To print the details as above, just type scale_data

ess$immigrant_attitudes <- scale_data$scores

This calculates the mean of the items, automatically ignoring any missing values. If you prefer that participants who omitted one of the scale items have a missing value for the scale, it will be easiest to calculate the mean manually:

ess <- ess %>% mutate(immigrant_attitudes = (imbleco + imtcjob + imwbcrm)/3)

10.2 Further resources

  • You can check this chapter of the Research Methods Knowledge Base for more details on various measures of internal consistency (and of other forms of reliability)

  1. The alpha() function from the psych package can support reverse-coding - or even automate it. To learn more, look at ?psych::alpha↩︎