http://www.sthda.com/english/articles/40-regression-analysis/164-interaction-effect-in-multiple-regression-essentials/This chapter describes how to compute multiple linear regression with interaction effects.Previously, we have described how to build a multiple linear regression model (Chapter @ref(linear-regression)) for predicting a continuous outcome variable (y) based on multiple predictor variables (x).For example, to predict sales, based on advertising budgets spent on youtube and facebook, the model equation is sales = b0 + b1*youtube + b2*facebook, where, b0 is the intercept; b1 and b2 are the regression coefficients associated respectively with the predictor variables youtube and facebook.The above equation, also known as additive model, investigates only the main effects of predictors. It assumes that the relationship between a given predictor variable and the outcome is independent of the other predictor variables (James et al. 2014,P. Bruce and Bruce (2017)).Considering our example, the additive model assumes that, the effect on sales of youtube advertising is independent of the effect of facebook advertising.This assumption might not be true. For example, spending money on facebook advertising may increase the effectiveness of youtube advertising on sales. In marketing, this is known as a synergy effect, and in statistics it is referred to as an interaction effect (James et al. 2014).In this chapter, you’ll learn:the equation of multiple linear regression with interactionR codes for computing the regression coefficients associated with the main effects and the interaction effectshow to interpret the interaction effectContents:EquationLoading Required R packagesPreparing the dataComputationAdditive modelInteraction effectsInterpretationComparing the additive and the interaction modelsDiscussionReferencesThe Book:
Machine Learning Essentials: Practical Guide in R
EquationThe multiple linear regression equation, with interaction effects between two predictors (x1 and x2), can be written as follow:y = b0 + b1*x1 + b2*x2 + b3*(x1*x2)Considering our example, it becomes:sales = b0 + b1*youtube + b2*facebook + b3*(youtube*facebook)This can be also written as:sales = b0 + (b1 + b3*facebook)*youtube + b2*facebookor as:sales = b0 + b1*youtube + (b2 +b3*youtube)*facebookb3 can be interpreted as the increase in the effectiveness of youtube advertising for a one unit increase in facebook advertising (or vice-versa).In the following sections, you will learn how to compute the regression coefficients in R.Loading Required R packagestidyverse for easy data manipulation and visualizationcaret for easy machine learning workflowlibrary(tidyverse)library(caret)Preparing the dataWe’ll use the marketing data set, introduced in the Chapter @ref(regression-analysis), for predicting sales units on the basis of the amount of money spent in the three advertising medias (youtube, facebook and newspaper)We’ll randomly split the data into training set (80% for building a predictive model) and test set (20% for evaluating the model).# Load the datadata("marketing", package = "datarium")# Inspect the datasample_n(marketing, 3)## youtube facebook newspaper sales## 58 163.4 23.0 19.9 15.8## 157 112.7 52.2 60.6 18.4## 81 91.7 32.0 26.8 14.2# Split the data into training and test setset.seed(123)training.samples <- marketing$sales %>% createDataPartition(p = 0.8, list = FALSE)train.data <- marketing[training.samples, ]test.data <- marketing[-training.samples, ]ComputationAdditive modelThe standard linear regression model can be computed as follow:# Build the modelmodel1 <- lm(sales ~ youtube + facebook, data = train.data)# Summarize the modelsummary(model1)## ## Call:## lm(formula = sales ~ youtube + facebook, data = train.data)## ## Residuals:## Min 1Q Median 3Q Max ## -10.481 -1.104 0.349 1.423 3.486 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.43446 0.40877 8.4 2.3e-14 ***## youtube 0.04558 0.00159 28.7 < 2e-16 ***## facebook 0.18788 0.00920 20.4 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 2.11 on 159 degrees of freedom## Multiple R-squared: 0.89, Adjusted R-squared: 0.889 ## F-statistic: 644 on 2 and 159 DF, p-value: <2e-16# Make predictionspredictions <- model1 %>% predict(test.data)# Model performance# (a) Prediction error, RMSERMSE(predictions, test.data$sales)## [1] 1.58# (b) R-squareR2(predictions, test.data$sales)## [1] 0.938Interaction effectsIn R, you include interactions between variables using the * operator:# Build the model# Use this: model2 <- lm(sales ~ youtube + facebook + youtube:facebook, data = marketing)# Or simply, use this: model2 <- lm(sales ~ youtube*facebook, data = train.data)# Summarize the modelsummary(model2)## ## Call:## lm(formula = sales ~ youtube * facebook, data = train.data)## ## Residuals:## Min 1Q Median 3Q Max ## -7.438 -0.482 0.231 0.748 1.860 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.90e+00 3.28e-01 24.06 <2e-16 ***## youtube 1.95e-02 1.64e-03 11.90 <2e-16 ***## facebook 2.96e-02 9.83e-03 3.01 0.003 ** ## youtube:facebook 9.12e-04 4.84e-05 18.86 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 1.18 on 158 degrees of freedom## Multiple R-squared: 0.966, Adjusted R-squared: 0.966 ## F-statistic: 1.51e+03 on 3 and 158 DF, p-value: <2e-16# Make predictionspredictions <- model2 %>% predict(test.data)# Model performance# (a) Prediction error, RMSERMSE(predictions, test.data$sales)## [1] 0.963# (b) R-squareR2(predictions, test.data$sales)## [1] 0.982InterpretationIt can be seen that all the coefficients, including the interaction term coefficient, are statistically significant, suggesting that there is an interaction relationship between the two predictor variables (youtube and facebook advertising).Our model equation looks like this:sales = 7.89 + 0.019*youtube + 0.029*facebook + 0.0009*youtube*facebookWe can interpret this as an increase in youtube advertising of 1000 dollars is associated with increased sales of (b1 + b3*facebook)*1000 = 19 + 0.9*facebook units. And an increase in facebook advertising of 1000 dollars will be associated with an increase in sales of (b2 + b3*youtube)*1000 = 28 + 0.9*youtube units.Note that, sometimes, it is the case that the interaction term is significant but not the main effects. The hierarchical principle states that, if we include an interaction in a model, we should also include the main effects, even if the p-values associated with their coefficients are not significant (James et al. 2014).Comparing the additive and the interaction modelsThe prediction error RMSE of the interaction model is 0.963, which is lower than the prediction error of the additive model (1.58).Additionally, the R-square (R2) value of the interaction model is 98% compared to only 93% for the additive model.These results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.DiscussionThis chapter describes how to compute multiple linear regression with interaction effects. Interaction terms should be included in the model if they are significantly.ReferencesBruce, Peter, and Andrew Bruce. 2017. Practical Statistics for Data Scientists. O’Reilly Media.James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated.