Topic 7 Interaction terms in linear models

Watch this video for an introduction to interaction terms in linear models:

The section below essentially contains the code needed for the examples in the video, with some further annotations. Before that, just one note on terminology:

  • Generally, if two variables interact, then the effect that one has on the predictor (the regression slope) depends on the value of the other. Mathematically, that relationship is symetrical - if A interacts with B, then B interacts with A.
  • For interpretation, we often designate one of the variables as the moderator. That just means that we are primarily interested in the effect of the other variable on the outcome and in how the moderator changes that effect. Often, demographic variables such as age or gender serve as moderators.

7.1 Example 1: Memory and chess

The first example draws on research by Gobet & Simon (1996) but uses simulated data.

library(tidyverse)

#Generate data (errors committed) - *roughly* based on Gobet & Simon 1996
set.seed(300688) #for reproducible results
ER <- rnorm(50, 4.9,3.5) + rnorm(50, 0, 2)
NR <- rnorm(50, 15.7, 4.0) + rnorm(50, 0, 2)
EF <- rnorm(50, 21.4, 5) + rnorm(50, 0, 2)
NF <- rnorm(50, 21.8, 5) + rnorm(50, 0, 2)

obs <- data.frame(player = "expert", type = "real", errors = ER) %>% 
  rbind(data.frame(player = "novice", type = "real", errors = NR)) %>%
  rbind(data.frame(player = "expert", type = "fake", errors = EF)) %>%
  rbind(data.frame(player = "novice", type = "fake", errors = NF)) %>% 
  mutate(type=factor(type), 
         player = factor(player, levels = c("novice", "expert")))

#Adding the centrally mirrored condition
EM <- rnorm(50, 7.8, 3.5) + rnorm(50, 0, 2)
NM <- rnorm(50, 18, 3.5) + rnorm(50, 0, 2)

obs2 <- data.frame(player = "expert", type = "real", errors = ER) %>% 
  rbind(data.frame(player = "novice", type = "real", errors = NR)) %>%
  rbind(data.frame(player = "expert", type = "fake", errors = EF)) %>%
  rbind(data.frame(player = "novice", type = "fake", errors = NF)) %>% 
  rbind(data.frame(player = "expert", type = "mirrored", errors = EM)) %>% 
  rbind(data.frame(player = "novice", type = "mirrored", errors = NM)) %>% 
  mutate(type=factor(type), 
         player = factor(player, levels = c("novice", "expert")))

The first model tests whether player level and position type interact. That is the case, based on the very small p-value of the interaction term. Then we use a plot to understand the nature of that interaction further - because the predictor is categorical, we use the cat_plot() function.

pacman::p_load(tidyverse)
mod <- lm(errors ~ player + type + player:type, obs)
summary(mod)
## 
## Call:
## lm(formula = errors ~ player + type + player:type, data = obs)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.8157  -3.6637  -0.0812   3.6591  14.2213 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            21.5744     0.7398  29.164  < 2e-16 ***
## playerexpert           -1.2240     1.0462  -1.170    0.243    
## typereal               -6.5092     1.0462  -6.222 2.91e-09 ***
## playerexpert:typereal  -8.0039     1.4795  -5.410 1.83e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.231 on 196 degrees of freedom
## Multiple R-squared:  0.5892,	Adjusted R-squared:  0.5829 
## F-statistic: 93.69 on 3 and 196 DF,  p-value: < 2.2e-16
pacman::p_load(interactions)
cat_plot(mod, pred="player", modx = "type", geom="line")
Simple interaction plot - note that lines are not parallel

Figure 7.1: Simple interaction plot - note that lines are not parallel

Next we consider a third condition - chess positions that are neither quite real nor entirely fake, but positions that are mirrored. With that, we get multiple dummy interaction terms. To test whether they are collectively significant, we need to use the Anova() function from the car package.

mod <- lm(errors ~ player + type + player:type, obs2)
summary(mod)
## 
## Call:
## lm(formula = errors ~ player + type + player:type, data = obs2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.8157  -3.3006  -0.0503   3.4305  14.2213 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                21.5744     0.6881  31.354  < 2e-16 ***
## playerexpert               -1.2240     0.9731  -1.258    0.209    
## typemirrored               -4.6796     0.9731  -4.809 2.43e-06 ***
## typereal                   -6.5092     0.9731  -6.689 1.13e-10 ***
## playerexpert:typemirrored  -8.3088     1.3762  -6.038 4.71e-09 ***
## playerexpert:typereal      -8.0039     1.3762  -5.816 1.57e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.866 on 294 degrees of freedom
## Multiple R-squared:  0.6085,	Adjusted R-squared:  0.6018 
## F-statistic: 91.38 on 5 and 294 DF,  p-value: < 2.2e-16
pacman::p_load(car)
car::Anova(mod, type=3)
## Anova Table (Type III tests)
## 
## Response: errors
##              Sum Sq  Df  F value    Pr(>F)    
## (Intercept) 23272.7   1 983.0610 < 2.2e-16 ***
## player         37.5   1   1.5821    0.2095    
## type         1126.9   2  23.8013 2.627e-10 ***
## player:type  1109.9   2  23.4420 3.580e-10 ***
## Residuals    6960.1 294                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

An interaction plot can then help again to understand what is going on. Here it shows that two of the three conditions are very similar.

cat_plot(mod, pred="player", modx = "type", geom="line")
Simple interaction plot - now 2 lines are parallel

Figure 7.2: Simple interaction plot - now 2 lines are parallel