Omitted Variable Bias in 3D

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

This Demonstration develops geometric intuition behind the concept of omitted variable bias. In econometrics modeling, there is a persistent risk of omitting an important variable (i.e. due to not understanding the true model structure or due to a lack of relevant data). As a result, endogeneity [1] spoils the model, resulting in inconsistent and therefore misleading estimations of model coefficients. Usually the effect of the omitted variable is explained analytically or via numerical examples, but visualization may bring more insight to the issue.


The true linear model (the index for each observation is dropped for readability) is considered, with the variable "omitted" during estimation. So we switch from a 3D space to a 2D space that can be visualized.


Contributed by: Timur Gareev (May 2018)
Open content licensed under CC BY-NC-SA



If we try to estimate a true model of the form

with an incorrect model of the form


the estimation of , which is signed , is most likely biased. Indeed, we have by definition:


Here we have




The element is an error term vector, where each observation term is independent (from each other) and normally distributed with mean 0 and variance .

The key term in the preceding expression is , which we can simulate with the control. If it equals zero, no problems arise—estimations are unbiased, . However, in practice the important feature of the model of the form

is that , and so omitting from the evaluation causes bias. It is always a source of confusion, but should not be too high, otherwise results even without omitted variables will be misleading (as we approach the other "trap" in the form of multicollinearity).

We demonstrate it as follows. The blue cloud is the observations cloud, and each point stands for one observation (the number of observations may be changed with ). The black lattice shows the true model. The blue lattice reflects the correct estimation with both variables and included. The red line is the regression with the omitted variable. (The red cloud is the projection of the observations cloud from true 3D space to 2D.) Use the checkboxes to switch the appearance of elements on and off. When you use the checkboxes, the cloud does not change. Be warned, though, that each time parameters change, the cloud of observation changes as well.

Most important are the slopes of black and blue planes and the red line. When the slopes are relatively equal, there is no bias.

Increasing and reducing may considerably reduce bias. Use the buttons to study special cases with or ( corresponds to perfect multicollinearity, so we cannot study this case). It is also recommended to check visually.


[1] Wikipedia. "Endogeneity (Econometrics)." (May 7, 2018)

Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.