11577

# Omitted Variable Bias

In social science research, control variables are often included out of concerns about inducing bias into the coefficients of interest [1, 2]. However, short of knowing the true data-generating process—an unlikely situation—the inclusion of even relevant controls may in fact aggravate the problem.
This is shown for the case of linear (OLS) and logit (GLM) models, where the true model includes three covariates. The first misspecified model omits the second and third covariates, and the second misspecified model omits only the third covariate. According to the logic of including controls, the bias on the expected value of the coefficient for the first covariate should always be larger in the first misspecified model, unless covariates are uncorrelated. This is not true for many GLM link functions, where coefficients may be biased even if included and excluded covariates are uncorrelated [3, 4]. At the red contour line no difference in bias exists between the first and second misspecified models. In regions where dashed contour lines indicate positive values, the inclusion of controls would indeed reduce bias. (Hover the mouse over the contour line to see the tooltip.) The lighter the region, the larger the reduction. In regions where solid contour lines indicate negative values, however, the inclusion of controls would induce bias. The darker the region, the larger the induction. For exact identification of coordinates, drag the cross-hairs locator to the desired position. The notation follows [1].

### DETAILS

For the case of OLS, let be the true model, be first misspecified model, and be the second misspecified model.
: , with ,
: , and
: .
If for we have , then .
If for we have , then the bias of the expected values of for and for are given by
(1) ,
(2) .
According to the logic of including controls in order to reduce bias, the following weak inequality should always hold.

For the case of GLM, as before let be the true model, let be the first misspecified model, and let be the second misspecified model.
: , with and ,
: , and
: .
The normalized values of and are given by [1] as
and
.
According to the logic of including controls in order to reduce bias, the following weak inequality should always hold.
.
References
[1] K. A. Clarke, "Return of the Phantom Menace: Omitted Variable Bias in Political Research," Conflict Management and Peace Science, 26(1), 2009 pp. 46–66.
[2] K. A. Clarke, "The Phantom Menace: Omitted Variable Bias in Econometric Research," Conflict Management and Peace Science, 22(4), 2005 pp. 341–352.
[3] M. H. Gail, S. Wieand, and S. Piantadosi, "Biased Estimates of Treatment Effect in Randomized Experiments with Nonlinear Regressions and Omitted Covariates," Biometrika, 71(3), 1984 pp. 431–444.
[4] J. S. Cramer, Logit Models from Economics and Other Fields, Cambridge: Cambridge University Press, 2003.

### PERMANENT CITATION

 Share: Embed Interactive Demonstration New! Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details » Download Demonstration as CDF » Download Author Code »(preview ») Files require Wolfram CDF Player or Mathematica.

#### Related Topics

 RELATED RESOURCES
 The #1 tool for creating Demonstrations and anything technical. Explore anything with the first computational knowledge engine. The web's most extensive mathematics resource. An app for every course—right in the palm of your hand. Read our views on math,science, and technology. The format that makes Demonstrations (and any information) easy to share and interact with. Programs & resources for educators, schools & students. Join the initiative for modernizing math education. Walk through homework problems one step at a time, with hints to help along the way. Unlimited random practice problems and answers with built-in step-by-step solutions. Practice online or make a printable study sheet. Knowledge-based programming for everyone.