In statistics we often complain about omitted variable bias. Throughout my academic training, however, I have never been instructed against included variable bias.
Omitted variable bias is the case where a regression fails to include a right hand factor. As a result, some of the explanatory power that the missing factor properly deserves is improperly allocated to other factors, and also to the error terms. Included variable bias, then, is the case where a right hand factor improperly exists. It steals some of the explanatory power which properly belongs to another existing factor.
Consider an example of included variable bias in action. When trying to measure the relationship between income and caloric intake, a right hand variable is included to correct for soda consumption. Obviously this is a problem because some of the income contributing to calories will be spent on soda. So the caloric affect would be attributed to soda consumption rather than income, but this is a false dichotomy because soda consumption is in turn attributable to income.
Sometime between 2013 and Spring 2015, I once took a Master’s level course where a student did this correction for soda and presented it to the class. The instructor, Sita Slavov, seemed to approve. I can’t find the syllabus for that course but this was in GMU’s policy school.
Having formulated this term independently, I googled for it, and, sure enough, other people have already talked about this. Here are some results:
- Jung et al, 2018, Omitted and Included Variable Bias in Tests for Disparate Impact
- Ayres, 2010, Testing for Discrimination and the Problem of “Included Variable Bias”
- Ayres, 2005, Three tests for measuring unjustified disparate impacts in organ transplantation: the problem of “included variable” bias