Lecture 4: Properties of OLS
Economics 326 — Introduction to Econometrics II
Properties of Estimators
- Random
- Mean
- Variance
- Distribution
\gdef\E#1{\mathrm{E}\left[#1\right]} \gdef\Var#1{\mathrm{Var}\left(#1\right)}
OLS Estimators as Random Variables
- The model \begin{aligned} &Y_{i} = \alpha + \beta X_{i} + U_{i}, \\ &\E{U_{i} \mid X_1, \ldots, X_n} = 0. \end{aligned} Conditioning on X_1, \ldots, X_n allows us to treat all the X_i’s as fixed, but Y_i is still random.
- To save on writing, we will use the notation \E{\cdot \mid \mathbf{X}} = \E{\cdot \mid X_1, \ldots, X_n}. That is, \mathbf{X}=(X_1, \ldots, X_n)
- The estimators \hat{\beta} = \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) Y_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} \text{ and } \hat{\alpha} = \bar{Y}-\hat{\beta}\bar{X} are random because they are functions of the random Y_i’s even after conditioning on the X_i’s.
Linearity of Estimators
- Since \hat{\beta} = \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) Y_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}, we can write \hat{\beta} = \sum_{i=1}^{n}w_{i}Y_{i}, where w_{i} = \frac{X_{i}-\bar{X}}{\sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}}. After conditioning on X’s, w_{i}’s are not random.
- For \hat{\alpha}, \begin{aligned} \hat{\alpha} &= \bar{Y}-\hat{\beta}\bar{X} \\ &= \frac{1}{n}\sum_{i=1}^{n}Y_{i}-\left( \sum_{i=1}^{n}w_{i}Y_{i}\right) \bar{X} \\ &= \sum_{i=1}^{n}\left( \frac{1}{n}-\bar{X}w_{i}\right) Y_{i} \\ &= \sum_{i=1}^{n}\left( \frac{1}{n}-\bar{X}\frac{X_{i}-\bar{X}}{ \sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}}\right) Y_{i}. \end{aligned}
Unbiasedness
Definition and Claim
\hat{\beta} is called an unbiased estimator if \E{\hat{\beta}} = \beta.
Claim: Suppose that
- Y_{i}=\alpha +\beta X_{i}+U_{i},
- \E{U_{i} \mid \mathbf{X}} = 0.
- Then \E{\hat{\beta}}=\beta.
Proof Step 1: Decomposition into signal and noise
\hat{\beta} = \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) Y_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}
\phantom{\hat{\beta}} = \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) \left( \alpha +\beta X_{i}+U_{i}\right) }{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}
\phantom{\hat{\beta}} = \alpha \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) }{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} + \beta \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) X_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} + \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}
\phantom{\hat{\beta}} = \alpha \frac{0}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} + \beta \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}{ \sum_{i=1}^{n}\left(X_{i}-\bar{X}\right) ^{2}} + \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}
\hat{\beta}={\color{blue}\underbrace{\beta}_{\text{signal}}} +{\color{red}\underbrace{\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}}_{\text{noise}}}
Proof Step 2: Conditioning on Regressors
- Once we condition on \mathbf{X}, all the X_i’s in \hat{\beta}=\beta +\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} can be treated as fixed.
- Thus, \begin{aligned} \E{\hat{\beta} \mid \mathbf{X}} & = \E{\beta +\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} \mid \mathbf{X}} \\ &= \beta + \E{\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} \mid \mathbf{X}} \\ &= \beta + \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) \E{U_{i} \mid \mathbf{X}} }{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{aligned}
Proof Step 3
- Thus, with \E{U_{i} \mid \mathbf{X}} = 0, we have \begin{aligned} \E{\hat{\beta} \mid \mathbf{X}} &= \beta +\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) \E{U_{i} \mid \mathbf{X}}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} \\ &= \beta +\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) \cdot 0}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} = \beta. \end{aligned}
- By the LIE, \E{\hat{\beta}} = \E{\E{\hat{\beta} \mid \mathbf{X}}} = \E{\beta} = \beta.
Strong Exogeneity of Regressors
- \mathbf{X}=(X_1,\ldots,X_n) are strongly exogenous if \E{U_{i} \mid \mathbf{X}} = 0.
- Alternatively, we can assume that \E{U_{i} \mid X_{i}} = 0 and all observations are independent: \begin{aligned} \E{U_{1} \mid \mathbf{X}} &= \E{U_{1} \mid X_{1}}, \\ \E{U_{2} \mid \mathbf{X}} &= \E{U_{2} \mid X_{2}} \text{ and etc.} \end{aligned}
- The OLS estimator is in general biased if the strong exogeneity assumption is violated.
Variance of the Slope Estimator
Variance Formula and Homoskedasticity
- If Y_{i}=\alpha +\beta X_{i}+U_{i}, \E{U_{i} \mid \mathbf{X}} = 0, and \E{U_{i}^{2} \mid \mathbf{X}} = \sigma^{2} = \text{constant}, and for i \neq j \E{U_{i}U_{j} \mid \mathbf{X}} = 0, then \Var{\hat{\beta} \mid \mathbf{X}} = \frac{\sigma^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}.
- The assumption \E{U_{i}^{2} \mid \mathbf{X}} = \sigma^{2} = \text{constant} is called (conditional) homoskedasticity.
- The assumption \E{U_{i}U_{j} \mid \mathbf{X}} = 0 for i \neq j can be replaced by the assumption that the observations are independent.
Determinants of Variance
\Var{\hat{\beta} \mid \mathbf{X}} = \frac{\sigma^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}.
- The variance of \hat{\beta} is positively related to the variance of the errors \sigma^{2} = \Var{U_{i}}.
- The variance of \hat{\beta} is smaller when the X_i’s are more dispersed.
Derivation of Variance
We have \hat{\beta}=\beta +\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X} \right) U_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}} and \E{\hat{\beta} \mid \mathbf{X}}=\beta. \begin{aligned} \Var{\hat{\beta} \mid \mathbf{X}} & = \E{\left( \hat{\beta}-\E{\hat{\beta} \mid \mathbf{X}}\right) ^{2} \mid \mathbf{X}} \\ &= \E{\left( \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}}{ \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}\right) ^{2} \mid \mathbf{X}} \\ &= \left( \frac{1}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}\right) ^{2} \E{\left( \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}\right) ^{2} \mid \mathbf{X}}. \end{aligned}
Expanding the square, \begin{aligned} \left( \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}\right) ^{2} &= \sum_{i=1}^{n}\sum_{j=1}^{n}\left( X_{i}-\bar{X}\right) \left( X_{j}-\bar{X}\right) U_{i}U_{j} \\ &= \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}U_{i}^{2}\\ &\quad + \sum_{i=1}^{n}\sum_{j\neq i}\left( X_{i}-\bar{X}\right) \left( X_{j}-\bar{X}\right) U_{i}U_{j}. \end{aligned}
Since \E{U_{i}U_{j} \mid \mathbf{X}} = 0 for i \neq j, \begin{aligned} \E{\left( \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}\right) ^{2} \mid \mathbf{X}} &= \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}\E{U_{i}^{2} \mid \mathbf{X}} + 0 \\ &= \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}\sigma^{2}. \end{aligned}
We have \begin{aligned} &\Var{\hat{\beta} \mid \mathbf{X}} = \left( \frac{1}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}\right) ^{2} \E{\left( \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}\right) ^{2} \mid \mathbf{X}}, \\ &\E{\left( \sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) U_{i}\right) ^{2} \mid \mathbf{X}} = \sigma^{2}\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}, \end{aligned} and therefore, \begin{aligned} \Var{\hat{\beta} \mid \mathbf{X}} &= \left( \frac{1}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}\right) ^{2} \sigma^{2}\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2} \\ &= \left( \frac{1}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}\right) \sigma^{2}. \end{aligned}
Distribution of the Slope Estimator
Normality of the OLS Estimator
- Assume that U_{i}’s are jointly normally distributed conditional on X’s.
- Then Y_{i}=\alpha +\beta X_{i}+U_{i} are also jointly normally distributed.
- Since \hat{\beta}=\sum_{i=1}^{n}w_{i}Y_{i}, where w_{i}=\frac{X_{i}-\bar{X}}{\sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}} depend only on the X_i’s, \hat{\beta} is also normally distributed conditional on the X_i’s.
- Conditional on \mathbf{X} \begin{aligned} \hat{\beta} \mid \mathbf{X}&\sim N\left( \E{\hat{\beta} \mid \mathbf{X}}, \Var{\hat{\beta} \mid \mathbf{X}} \right) \\ &\sim N\left( \beta, \frac{\sigma^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}\right). \end{aligned}