Lecture 18: Misspecification

Economics 326 — Introduction to Econometrics II

Vadim Marmer, UBC

Strong exogeneity and the CEF

Consider the linear regression model Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}.
When the errors are strongly exogenous, i.e., \mathrm{E}\left[U_{i} \mid X_{i}\right] = 0, the linear regression model defines the CEF of Y conditional on X: \begin{align*} \mathrm{CEF}_{Y}\left( X_{i}\right) &\equiv \mathrm{E}\left[Y_{i} \mid X_{i}\right] \\ &= \mathrm{E}\left[\beta _{0}+\beta _{1}X_{i}+U_{i} \mid X_{i}\right] \\ &= \beta _{0}+\beta _{1}X_{i}+\mathrm{E}\left[U_{i} \mid X_{i}\right] \\ &= \beta _{0}+\beta _{1}X_{i}. \end{align*}

Consider the linear regression model with \mathrm{E}\left[U_{i}\right] = 0: \begin{align*} Y_{i} &= \beta _{0}+\beta _{1}X_{i}+U_{i}, \\ \mathrm{E}\left[U_{i}\right] &= 0 \end{align*}
Suppose the errors are only weakly exogenous: \mathrm{E}\left[U_{i}X_{i}\right] = 0.
In this case, \mathrm{CEF}_{Y}\left( X_{i}\right) \neq \beta _{0}+\beta _{1}X_{i}.
Question: What does the econometrician estimate when running a linear regression and the regressors are not strongly exogenous?

Suppose that \mathrm{E}\left[Y_{i} \mid X_{i}\right] = g\left( X_{i}\right), where g is some unknown nonlinear function. Thus, the true CEF is g\left( X_{i}\right) \neq \beta _{0}+\beta _{1}X_{i}.
Define \varepsilon_{i}=Y_{i}-\mathrm{E}\left[Y_{i} \mid X_{i}\right], so the true model is Y_{i} = g\left( X_{i}\right) +\varepsilon_{i} with \mathrm{E}\left[\varepsilon_{i} \mid X_{i}\right] = 0.
Adding and subtracting \beta _{0}+\beta _{1}X_{i}: \begin{align*} Y_{i} &= g\left( X_{i}\right) +\varepsilon_{i} \\ &= \beta _{0}+\beta _{1}X_{i} \\ &\quad +\underbrace{g\left( X_{i}\right) -\beta _{0}-\beta _{1}X_{i}}_{\text{specification error}} +\varepsilon_{i} \end{align*}
Rearranging, Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}, where the regression error U_{i} combines the true error and the specification error: U_{i}=\varepsilon_{i}+\underbrace{g\left( X_{i}\right) -\beta _{0}-\beta _{1}X_{i}}_{\text{specification error}}.
Can we find \beta _{0} and \beta _{1} so that X_{i} is uncorrelated with U_{i}, i.e., \mathrm{E}\left[U_{i}\right] = 0 and \mathrm{E}\left[X_{i}U_{i}\right] = 0?

Denote the specification error by \Delta\left( X_{i}\right) = g\left( X_{i}\right) -\beta _{0}-\beta _{1}X_{i}, so U_{i}=\varepsilon_{i}+\Delta\left( X_{i}\right).
Since \mathrm{E}\left[\varepsilon_{i} \mid X_{i}\right] = 0, the law of iterated expectations gives \begin{align*} \mathrm{E}\left[\varepsilon_{i}\right] &= \mathrm{E}\left[\mathrm{E}\left[\varepsilon_{i} \mid X_{i}\right]\right] = 0, \\ \mathrm{E}\left[\varepsilon_{i}X_{i}\right] &= \mathrm{E}\left[X_{i}\,\mathrm{E}\left[\varepsilon_{i} \mid X_{i}\right]\right] = 0. \end{align*}
Therefore \begin{align*} \mathrm{E}\left[U_{i}\right] &= \mathrm{E}\left[\varepsilon_{i}\right] + \mathrm{E}\left[\Delta\left( X_{i}\right)\right] = \mathrm{E}\left[\Delta\left( X_{i}\right)\right], \\ \mathrm{E}\left[U_{i}X_{i}\right] &= \mathrm{E}\left[\varepsilon_{i}X_{i}\right] + \mathrm{E}\left[\Delta\left( X_{i}\right) X_{i}\right] = \mathrm{E}\left[\Delta\left( X_{i}\right) X_{i}\right]. \end{align*}
The conditions \mathrm{E}\left[U_{i}\right] = 0 and \mathrm{E}\left[U_{i}X_{i}\right] = 0 reduce to conditions on the specification error alone: \begin{align*} \mathrm{E}\left[\Delta\left( X_{i}\right)\right] &= 0, \\ \mathrm{E}\left[\Delta\left( X_{i}\right) X_{i}\right] &= 0. \end{align*} That is, we need \beta _{0} and \beta _{1} such that the specification error has mean zero and is uncorrelated with X_{i}.

Consider the following approximation problem: \min_{b_{0},b_{1}}\mathrm{E}\left[\left( g\left( X_{i}\right) -b_{0}-b_{1}X_{i}\right) ^{2}\right].
We are approximating the CEF with a linear function.
Among the linear functions, we are looking for the best linear approximation in the mean squared error (MSE) sense.

Let \left( \beta _{0},\beta _{1}\right) =\arg \min_{b_{0},b_{1}}\text{MSE}\left( b_{0},b_{1}\right), where \text{MSE}\left( b_{0},b_{1}\right) =\mathrm{E}\left[\left( g\left( X_{i}\right) -b_{0}-b_{1}X_{i}\right) ^{2}\right].
The first-order conditions are: \begin{align*} \frac{\partial \text{MSE}}{\partial b_{0}} &= -2\,\mathrm{E}\bigl[\underbrace{g\left( X_{i}\right) -\beta _{0}-\beta _{1}X_{i}}_{\Delta(X_{i})}\bigr] = 0, \\ \frac{\partial \text{MSE}}{\partial b_{1}} &= -2\,\mathrm{E}\bigl[\underbrace{\left( g\left( X_{i}\right) -\beta _{0}-\beta _{1}X_{i}\right)}_{\Delta(X_{i})} X_{i}\bigr] = 0. \end{align*}
OLS chooses \beta _{0} and \beta _{1} so that X_{i} is uncorrelated with the specification error \Delta\left( X_{i}\right) = g\left( X_{i}\right) -\beta _{0}-\beta _{1}X_{i}, yielding the best linear approximation of the CEF in the MSE sense.
Since U_{i}=\varepsilon_{i}+\Delta\left( X_{i}\right), this also gives \mathrm{E}\left[U_{i}\right] = 0\text{ and }\mathrm{E}\left[U_{i}X_{i}\right] = 0.

Recall U_{i} = \varepsilon_{i} + \Delta\left( X_{i}\right), where \Delta\left( X_{i}\right) = g\left( X_{i}\right) -\beta _{0}-\beta _{1}X_{i} is the specification error.
Suppose the true error \varepsilon_{i} is homoskedastic: \mathrm{E}\left[\varepsilon_{i}^{2} \mid X_{i}\right] = \sigma _{\varepsilon}^{2} for all X_{i}.
When the specification error is nonzero, U_{i} is heteroskedastic: \begin{align*} \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right] &= \mathrm{E}\left[\left( \varepsilon_{i}+\Delta\left( X_{i}\right) \right) ^{2} \mid X_{i}\right] \\ &= \mathrm{E}\left[\varepsilon_{i}^{2} \mid X_{i}\right]+\Delta\left( X_{i}\right) ^{2}+2\,\Delta\left( X_{i}\right) \mathrm{E}\left[\varepsilon_{i} \mid X_{i}\right] \\ &= \sigma _{\varepsilon}^{2}+\Delta\left( X_{i}\right) ^{2}. \end{align*}