Subscribe to DSC Newsletter

I believe biostatisticians use it, especially with small data sets, in the context of clinical trials, especially when using dummy variables (e.g. 0/1 for gender.) I am wondering how decision trees and other models compare with interaction regression. I would think the interaction regression  model suffers from the same issues as polynomial regression

For those not familiar with the concept, below is an introduction to interaction regression, from Stattrek

In regression, an interaction effect exists when the effect of an independent variable on a dependent variable changes, depending on the value(s) of one or more other independent variables.

Interaction Effects in Equations

In a regression equation, an interaction effect is represented as the product of two or more independent variables. For example, here is a typical regression equation without an interaction:

ŷ = b0 + b1*X1 + b2*X2

where ŷ is the predicted value of a dependent variable, X1 and X2 are independent variables, and b0, b1, and b2 are regression coefficients.

And here is the same regression equation with an interaction:

ŷ = b0 + b1*X1 + b2*X2 + b3*X1*X2

Here, b3 is a regression coefficient, and X1X2 is the interaction. The interaction between X1 and X2 is called a two-way interaction, because it is the interaction between two independent variables. Higher-order interactions are possible, as illustrated by the three-way interaction in the following equation:

ŷ = b0 + b1*X1 + b2*X2 + b3*X3 + b4*X1*X2 + b5*X1*X3 + b6*X2*X3 + b7*X1*X2*X3

Analysts usually steer clear of higher-order interactions, like X1*X2*X3, since they can be hard to interpret.


For males, drug dosage has a minimal effect on anxiety; but for females, the effect is dramatic. The effect of drug dose cannot be understood without accounting for the gender of the person receiving the medication.

Typically, when a regression equation includes an interaction term, the first question you ask is: Does the interaction term contribute in a meaningful way to the explanatory power of the equation? You can answer that question by:

  • Assessing the statistical significance of the interaction term.
  • Comparing the coefficient of determination with and without the interaction term.

If the interaction term is statistically significant, the interaction term is probably important. And if the coefficient of determination is also much bigger with the interaction term, it is definitely important. If neither of these outcomes are observed, the interaction term can be removed from the regression equation.

DSC Resources

Follow us: Twitter | Facebook

Views: 351

Reply to This

Replies to This Discussion

Is this a question or a statement? The title of the article seems to imply that modelling interactions is passé, but the text of the article seems to advocate doing it. So I'm a little confused.

If it's a question, then the answer from me is "Me. I do it often, as do many of my colleagues, and I see it often in published articles in my field."

If it's a statement, then perhaps you should modify the title, e.g. "Why it's important to model interactions in regression" might be better. Or just delete "still".

Hi Capri,

Forgive me if I've got the 'wrong-end-of-the-stick' here but I am always cautious about using categorical variables (especially those with few 'categories') in interactive effects.

In the illustration used here, it is clear that the 'Dose, mg' effect is very different as determined by the gender of the patient.  Your article claims this is an 'interaction effect' but in this case, with only two possible 'genders' (male or female - biologically you cannot be 'both' or even 'proportionally male or female') then it would be much simpler to separate the two regressions.  That is, if the patient is male, y-hat = b0 + b1x1 +b2x2... while if the patient is female, y-hat = B0 + B1X1 + B2X2...

This would be a far more logical approach for this specific illustration, would it not?

Where interaction effects are more relevant is where you either have a large number of 'categories' for the variable(s) or where the variable(s) is(are) 'continuous'.  In this situation, it becomes impractical or impossible to separate the effects and deal with them independently and so it is vital that the interaction effect is included within the model.

I have used this extensively in my previous roles and would continue to do so as the interaction effect is very real in the field of Chemistry (my original discipline) and attempting to consider only the main-effects (with no consideration of interaction-effects) would lead to a false conclusion.  Linear relationships (correlations) are what most chemical analyses are predicated upon so consideration of interaction effects is vital in this field.

I hope I have understood and interpreted your question correctly.  Thanks for the prompt to enable me to join the fascinating conversation.



  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service