Economics 326 — Introduction to Econometrics II
The OLS estimator is not the only estimator we can construct. There are alternative estimators with some desirable properties.
Example: Using only the first two observations, suppose that \(X_2 \neq X_1\). \[ \tilde{\beta} = \frac{Y_2 - Y_1}{X_2 - X_1}. \]
\(\tilde{\beta}\) is linear: \[ \tilde{\beta} = c_1 Y_1 + c_2 Y_2, \] where \[ c_1 = -\frac{1}{X_2 - X_1} \text{ and } c_2 = \frac{1}{X_2 - X_1}. \]
Among all linear and unbiased estimators, an estimator with the smallest variance is called the Best Linear Unbiased Estimator (BLUE).
Note that the statement is conditional on \(X\)’s:
The estimators are unbiased conditionally on \(X\)’s.
The variance is conditional on \(X\)’s.
Suppose that
\(Y_i = \alpha + \beta X_i + U_i\).
\(E(U_i | X_1, \ldots, X_n) = 0\).
\(E(U_i^2 | X_1, \ldots, X_n) = \sigma^2\) for all \(i = 1, \ldots, n\) (homoskedasticity).
For all \(i \neq j\), \(E(U_i U_j | X_1, \ldots, X_n) = 0\).
Then, conditionally on \(X\)’s, the OLS estimators are BLUE.
We already know that the OLS estimator \(\hat{\beta}\) is linear and unbiased (conditionally on \(X\)’s).
Let \(\tilde{\beta}\) be any other estimator of \(\beta\) such that
\(\tilde{\beta}\) is linear: \[ \tilde{\beta} = \sum_{i=1}^{n} c_i Y_i, \] where \(c\)’s depend only on \(X\)’s.
\(\tilde{\beta}\) is unbiased: \[ E\tilde{\beta} = \beta, \] where expectation is conditional on \(X\)’s.
We need to show that for any such \(\tilde{\beta} \neq \hat{\beta}\), \[ Var(\tilde{\beta}) > Var(\hat{\beta}), \] where the variance is conditional on \(X\)’s.
First, we are going to show that the \(c\)’s in \(\tilde{\beta} = \sum_{i=1}^{n} c_i Y_i\) satisfy \(\sum_{i=1}^{n} c_i = 0\) and \(\sum_{i=1}^{n} c_i X_i = 1\).
Using the results of Step 1, we will show that conditionally on \(X\)’s, \(Cov(\tilde{\beta}, \hat{\beta}) = Var(\hat{\beta})\).
Using the results of Step 2, we will show that conditionally on \(X\)’s, \(Var(\tilde{\beta}) \geq Var(\hat{\beta})\).
Lastly, we will show that \(Var(\tilde{\beta}) = Var(\hat{\beta})\) if and only if \(\tilde{\beta} = \hat{\beta}\).
Since \(\tilde{\beta} = \sum_{i=1}^{n} c_i Y_i\), \[\begin{align*} \tilde{\beta} &= \sum_{i=1}^{n} c_i (\alpha + \beta X_i + U_i) \\ &= \alpha \sum_{i=1}^{n} c_i + \beta \sum_{i=1}^{n} c_i X_i + \sum_{i=1}^{n} c_i U_i. \end{align*}\]
Conditionally on \(X\)’s, \[\begin{align*} E\tilde{\beta} &= E\left(\alpha \sum_{i=1}^{n} c_i + \beta \sum_{i=1}^{n} c_i X_i + \sum_{i=1}^{n} c_i U_i\right) \\ &= \alpha \sum_{i=1}^{n} c_i + \beta \sum_{i=1}^{n} c_i X_i + \sum_{i=1}^{n} c_i E U_i \\ &= \alpha \sum_{i=1}^{n} c_i + \beta \sum_{i=1}^{n} c_i X_i. \end{align*}\]
From the linearity we have that, conditionally on \(X\)’s, \[ E\tilde{\beta} = \alpha \sum_{i=1}^{n} c_i + \beta \sum_{i=1}^{n} c_i X_i. \]
From the unbiasedness we have that conditionally on \(X\)’s, \[ \beta = E\tilde{\beta} = \alpha \sum_{i=1}^{n} c_i + \beta \sum_{i=1}^{n} c_i X_i. \]
Since this has to be true for any \(\alpha\) and \(\beta\), it follows now that \[ \sum_{i=1}^{n} c_i = 0, \quad \sum_{i=1}^{n} c_i X_i = 1. \]
We have \[ \tilde{\beta} = \beta + \sum_{i=1}^{n} c_i U_i, \text{ with } \sum_{i=1}^{n} c_i = 0, \sum_{i=1}^{n} c_i X_i = 1. \] \[ \hat{\beta} = \beta + \sum_{i=1}^{n} w_i U_i, \text{ with } w_i = \frac{X_i - \bar{X}}{\sum_{j=1}^{n} (X_j - \bar{X})^2}. \]
Conditionally on \(X\)’s, \[\begin{align*} Cov(\tilde{\beta}, \hat{\beta}) &= E[(\tilde{\beta} - \beta)(\hat{\beta} - \beta)] \\ &= E\left[\left(\sum_{i=1}^{n} c_i U_i\right)\left(\sum_{i=1}^{n} w_i U_i\right)\right] \\ &= \sum_{i=1}^{n} c_i w_i E(U_i^2) + \sum_{i=1}^{n} \sum_{j \neq i} c_i w_j E(U_i U_j). \end{align*}\]
\[ Cov(\tilde{\beta}, \hat{\beta}) = \sum_{i=1}^{n} c_i w_i E(U_i^2) + \sum_{i=1}^{n} \sum_{j \neq i} c_i w_j E(U_i U_j). \]
Since \(E(U_i^2) = \sigma^2\) for all \(i\)’s: \[ \sum_{i=1}^{n} c_i w_i E(U_i^2) = \sigma^2 \sum_{i=1}^{n} c_i w_i. \]
Since \(E(U_i U_j) = 0\) for all \(i \neq j\), \[ \sum_{i=1}^{n} \sum_{j \neq i} c_i w_j E(U_i U_j) = 0. \]
Thus, \[ Cov(\tilde{\beta}, \hat{\beta}) = \sigma^2 \sum_{i=1}^{n} c_i w_i. \]
Conditionally on \(X\)’s: \[ Cov(\tilde{\beta}, \hat{\beta}) = \sigma^2 \sum_{i=1}^{n} c_i w_i \text{ and } w_i = \frac{X_i - \bar{X}}{\sum_{j=1}^{n} (X_j - \bar{X})^2}. \] \[\begin{align*} Cov(\tilde{\beta}, \hat{\beta}) &= \sigma^2 \sum_{i=1}^{n} c_i \frac{X_i - \bar{X}}{\sum_{j=1}^{n} (X_j - \bar{X})^2} \\ &= \frac{\sigma^2}{\sum_{j=1}^{n} (X_j - \bar{X})^2} \sum_{i=1}^{n} c_i (X_i - \bar{X}) \\ &= \frac{\sigma^2}{\sum_{j=1}^{n} (X_j - \bar{X})^2} \left(\sum_{i=1}^{n} c_i X_i - \bar{X} \sum_{i=1}^{n} c_i\right) \\ &= \frac{\sigma^2}{\sum_{j=1}^{n} (X_j - \bar{X})^2} (1 + \bar{X} \cdot 0) \\ &= Var(\hat{\beta}). \end{align*}\]
We know now that for any linear and unbiased \(\tilde{\beta}\), \[ Cov(\tilde{\beta}, \hat{\beta}) = Var(\hat{\beta}). \]
Let’s consider \(Var(\tilde{\beta} - \hat{\beta})\): \[\begin{align*} Var(\tilde{\beta} - \hat{\beta}) &= Var(\tilde{\beta}) + Var(\hat{\beta}) - 2 Cov(\tilde{\beta}, \hat{\beta}) \\ &= Var(\tilde{\beta}) + Var(\hat{\beta}) - 2 Var(\hat{\beta}) \\ &= Var(\tilde{\beta}) - Var(\hat{\beta}). \end{align*}\]
But since \(Var(\tilde{\beta} - \hat{\beta}) \geq 0\), \[ Var(\tilde{\beta}) - Var(\hat{\beta}) \geq 0 \] or \[ Var(\tilde{\beta}) \geq Var(\hat{\beta}). \]
Suppose that \(Var(\tilde{\beta}) = Var(\hat{\beta})\).
Then, \[ Var(\tilde{\beta} - \hat{\beta}) = Var(\tilde{\beta}) - Var(\hat{\beta}) = 0. \]
Thus, \(\tilde{\beta} - \hat{\beta}\) is not random or \[ \tilde{\beta} - \hat{\beta} = \text{constant}. \]
This constant also has to be zero because \[\begin{align*} E\tilde{\beta} &= E\hat{\beta} + \text{constant} \\ &= \beta + \text{constant}, \end{align*}\] and in order for \(\tilde{\beta}\) to be unbiased \[ \text{constant} = 0 \text{ or } \tilde{\beta} = \hat{\beta}. \]