Omitted Variable Bias

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

In social science research, control variables are often included out of concerns about inducing bias into the coefficients of interest [1, 2]. However, short of knowing the true data-generating process—an unlikely situation—the inclusion of even relevant controls may in fact aggravate the problem.

[more]

This is shown for the case of linear (OLS) and logit (GLM) models, where the true model includes three covariates. The first misspecified model omits the second and third covariates, and the second misspecified model omits only the third covariate. According to the logic of including controls, the bias on the expected value of the coefficient for the first covariate should always be larger in the first misspecified model, unless covariates are uncorrelated. This is not true for many GLM link functions, where coefficients may be biased even if included and excluded covariates are uncorrelated [3, 4]. At the red contour line no difference in bias exists between the first and second misspecified models. In regions where dashed contour lines indicate positive values, the inclusion of controls would indeed reduce bias. (Hover the mouse over the contour line to see the tooltip.) The lighter the region, the larger the reduction. In regions where solid contour lines indicate negative values, however, the inclusion of controls would induce bias. The darker the region, the larger the induction. For exact identification of coordinates, drag the cross-hairs locator to the desired position. The notation follows [1].

[less]

Contributed by: Alrik Thiem (December 2010)
After work by: Kevin A. Clarke, University of Rochester (USA)
Open content licensed under CC BY-NC-SA


Snapshots


Details

For the case of OLS, let be the true model, be first misspecified model, and be the second misspecified model.

: , with ,

: , and

: .

If for we have , then .

If for we have , then the bias of the expected values of for and for are given by

(1) ,

(2) .

According to the logic of including controls in order to reduce bias, the following weak inequality should always hold.

For the case of GLM, as before let be the true model, let be the first misspecified model, and let be the second misspecified model.

: , with and ,

: , and

: .

The normalized values of and are given by [1] as

and

.

According to the logic of including controls in order to reduce bias, the following weak inequality should always hold.

.

References

[1] K. A. Clarke, "Return of the Phantom Menace: Omitted Variable Bias in Political Research," Conflict Management and Peace Science, 26(1), 2009 pp. 46–66.

[2] K. A. Clarke, "The Phantom Menace: Omitted Variable Bias in Econometric Research," Conflict Management and Peace Science, 22(4), 2005 pp. 341–352.

[3] M. H. Gail, S. Wieand, and S. Piantadosi, "Biased Estimates of Treatment Effect in Randomized Experiments with Nonlinear Regressions and Omitted Covariates," Biometrika, 71(3), 1984 pp. 431–444.

[4] J. S. Cramer, Logit Models from Economics and Other Fields, Cambridge: Cambridge University Press, 2003.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send