library(wooldridge)
library(lmtest)
library(sandwich)
data("wage1")
reg <- lm(wage ~ educ + exper + tenure, data = wage1)Lecture 17: Asymptotics
Economics 326 — Introduction to Econometrics II
Part I: Consistency
Why we need large-sample theory
The OLS estimator \hat{\beta} has desirable properties:
\hat{\beta} is unbiased if the errors are strongly exogenous: \mathrm{E}\left[U \mid X\right] =0.
If in addition the errors are homoskedastic, then \widehat{\mathrm{Var}}\left(\hat{\beta}\right)=s^{2}/\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2} is an unbiased estimator of the conditional variance of \hat{\beta}.
If in addition the errors are normally distributed (given X), then T=\left( \hat{\beta}-\beta \right) /\sqrt{\widehat{\mathrm{Var}}\left(\hat{\beta}\right)} has a t distribution which can be used for hypothesis testing.
Limitations of finite-sample theory
If the errors are only weakly exogenous: \mathrm{E}\left[X_{i}U_{i}\right] =0, the OLS estimator is in general biased.
If the errors are heteroskedastic: \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right] =h\left( X_{i}\right), the “usual” variance formula is invalid; we also do not have an unbiased estimator for the variance in this case.
If the errors are not normally distributed conditional on X, then T- and F-statistics do not have t and F distributions under the null hypothesis.
Asymptotic (large-sample) theory allows us to derive approximate properties and distributions of estimators and test statistics by assuming that the sample size n is very large.
Convergence in probability and LLN
Let \theta _{n} be a sequence of random variables indexed by the sample size n. We say that \theta _{n} converges in probability if \lim_{n\rightarrow \infty }P\left( \left\vert \theta _{n}-\theta \right\vert \geq \varepsilon \right) =0\text{ for all }\varepsilon >0.
We denote this as \theta _{n}\rightarrow _{p}\theta or p\lim \theta _{n}=\theta.
An example of convergence in probability is a Law of Large Numbers (LLN):
Let X_{1},X_{2},\ldots ,X_{n} be a random sample such that \mathrm{E}\left[X_{i}\right] =\mu for all i=1,\ldots ,n, and define \bar{X}_{n}=\frac{1}{n}\sum_{i=1}^{n}X_{i}. Then, under certain conditions, \bar{X}_{n}\rightarrow _{p}\mu.
LLN
Let X_{1},\ldots ,X_{n} be a sample of independent identically distributed (iid) random variables. Let \mathrm{E}\left[X_{i}\right]=\mu. If \mathrm{Var}\left(X_{i}\right)=\sigma ^{2}<\infty, then \bar{X}_{n}\rightarrow _{p}\mu.
In fact when the data are iid, the LLN holds if \mathrm{E}\left[\left\vert X_{i}\right\vert\right] <\infty, but we prove the result under a stronger assumption that \mathrm{Var}\left(X_{i}\right)<\infty.
Markov’s inequality
Markov’s inequality. Let W be a random variable. For \varepsilon >0 and r>0, P\left( \left\vert W\right\vert \geq \varepsilon \right) \leq \frac{\mathrm{E}\left[\left\vert W\right\vert ^{r}\right]}{\varepsilon ^{r}}.
With r=2, we have Chebyshev’s inequality. Suppose that \mathrm{E}\left[X\right]=\mu. Take W\equiv X-\mu and apply Markov’s inequality with r=2. For \varepsilon >0,
\begin{aligned} P\left( \left\vert X-\mu \right\vert \geq \varepsilon \right) &\leq \frac{\mathrm{E}\left[\left\vert X-\mu \right\vert ^{2}\right]}{\varepsilon ^{2}} \\ &= \frac{\mathrm{Var}\left(X\right)}{\varepsilon ^{2}}. \end{aligned}
The probability of observing an outlier (a large deviation of X from its mean \mu) can be bounded by the variance.
Proof of the LLN
Fix \varepsilon >0 and apply Markov’s inequality with r=2:
\begin{aligned} P\left( \left\vert \bar{X}_{n}-\mu \right\vert \geq \varepsilon \right) &= P\left( \left\vert \frac{1}{n}\sum_{i=1}^{n}X_{i}-\mu \right\vert \geq \varepsilon \right) \\ &= P\left( \left\vert \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \right\vert \geq \varepsilon \right) \\ &\leq \frac{\mathrm{E}\left[\left( \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \right) ^{2}\right]}{\varepsilon ^{2}} \\ &= \frac{1}{n^{2}\varepsilon ^{2}}\left( \sum_{i=1}^{n}\mathrm{E}\left[\left( X_{i}-\mu \right) ^{2}\right]+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{E}\left[\left( X_{i}-\mu \right) \left( X_{j}-\mu \right)\right] \right) \\ &= \frac{1}{n^{2}\varepsilon ^{2}}\left( \sum_{i=1}^{n}\mathrm{Var}\left(X_{i}\right)+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{Cov}\left(X_{i},X_{j}\right)\right) \\ &= \frac{n\sigma ^{2}}{n^{2}\varepsilon ^{2}} = \frac{\sigma ^{2}}{n\varepsilon ^{2}} \rightarrow 0 \text{ as }n\rightarrow \infty \text{ for all }\varepsilon >0. \end{aligned}
Averaging and variance reduction
Let X_{1},\ldots ,X_{n} be a sample and suppose that
\begin{aligned} \mathrm{E}\left[X_{i}\right] &= \mu \text{ for all }i=1,\ldots ,n, \\ \mathrm{Var}\left(X_{i}\right) &= \sigma ^{2}\text{ for all }i=1,\ldots ,n, \\ \mathrm{Cov}\left(X_{i},X_{j}\right) &= 0\text{ for all }j\neq i. \end{aligned}
The mean of the sample average:
\begin{aligned} \mathrm{E}\left[\bar{X}_{n}\right] &= \mathrm{E}\left[\frac{1}{n}\sum_{i=1}^{n}X_{i}\right] \\ &= \frac{1}{n}\sum_{i=1}^{n}\mathrm{E}\left[X_{i}\right] \\ &= \frac{1}{n}\sum_{i=1}^{n}\mu = \frac{1}{n}n\mu =\mu. \end{aligned}
Variance of the sample average
The variance of the sample average:
\begin{aligned} \mathrm{Var}\left(\bar{X}_{n}\right) &= \mathrm{Var}\left(\frac{1}{n}\sum_{i=1}^{n}X_{i}\right) \\ &= \frac{1}{n^{2}}\mathrm{Var}\left(\sum_{i=1}^{n}X_{i}\right) \\ &= \frac{1}{n^{2}}\left( \sum_{i=1}^{n}\mathrm{Var}\left(X_{i}\right)+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{Cov}\left(X_{i},X_{j}\right)\right) \\ &= \frac{1}{n^{2}}\left( \sum_{i=1}^{n}\sigma ^{2}+\sum_{i=1}^{n}\sum_{j\neq i}0\right) \\ &= \frac{1}{n^{2}}n\sigma ^{2}=\frac{\sigma ^{2}}{n}. \end{aligned}
The variance of the average approaches zero as n\rightarrow \infty if the observations are uncorrelated.
Convergence in probability: properties
Slutsky’s Lemma. Suppose that \theta _{n}\rightarrow _{p}\theta, and let g be a function continuous at \theta. Then, g\left( \theta _{n}\right) \rightarrow _{p}g\left( \theta \right).
If \theta _{n}\rightarrow _{p}\theta, then \theta _{n}^{2}\rightarrow _{p}\theta ^{2}.
If \theta _{n}\rightarrow _{p}\theta and \theta \neq 0, then 1/\theta _{n}\rightarrow _{p}1/\theta.
Suppose that \theta _{n}\rightarrow _{p}\theta and \lambda _{n}\rightarrow _{p}\lambda. Then,
\theta _{n}+\lambda _{n}\rightarrow _{p}\theta +\lambda.
\theta _{n}\lambda _{n}\rightarrow _{p}\theta \lambda.
\theta _{n}/\lambda _{n}\rightarrow _{p}\theta /\lambda provided that \lambda \neq 0.
Consistency
Let \hat{\beta}_{n} be an estimator of \beta based on a sample of size n.
We say that \hat{\beta}_{n} is a consistent estimator of \beta if as n\rightarrow \infty, \hat{\beta}_{n}\rightarrow _{p}\beta.
Consistency means that the probability of the event that the distance between \hat{\beta}_{n} and \beta exceeds \varepsilon >0 can be made arbitrarily small by increasing the sample size.
Consistency of OLS
Suppose that:
The data \left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\} are iid.
Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}, where \mathrm{E}\left[U_{i}\right] =0.
\mathrm{E}\left[X_{i}U_{i}\right] =0.
0<\mathrm{Var}\left(X_{i}\right)<\infty.
Let \hat{\beta}_{0,n} and \hat{\beta}_{1,n} be the OLS estimators of \beta _{0} and \beta _{1} based on a sample of size n. Under Assumptions 1–4, \begin{aligned} \hat{\beta}_{0,n} &\rightarrow _{p}\beta _{0}, \\ \hat{\beta}_{1,n} &\rightarrow _{p}\beta _{1}. \end{aligned}
The key identifying assumption is Assumption 3: \mathrm{Cov}\left(X_{i},U_{i}\right)=0.
Proof of consistency
Write
\begin{aligned} \hat{\beta}_{1,n} = \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} &= \beta _{1}+\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \\ &= \beta _{1}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \end{aligned}
We will show that \begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &\rightarrow _{p}0, \\ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2} &\rightarrow _{p}\mathrm{Var}\left(X_{i}\right), \end{aligned}
Since \mathrm{Var}\left(X_{i}\right)\neq 0, \hat{\beta}_{1,n} = \beta _{1}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \rightarrow _{p} \beta _{1}+\frac{0}{\mathrm{Var}\left(X_{i}\right)}= \beta _{1}.
Numerator converges to zero
\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} = \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i}-\bar{X}_{n}\left( \frac{1}{n}\sum_{i=1}^{n}U_{i}\right).
By the LLN,
\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i} &\rightarrow _{p}\mathrm{E}\left[X_{i}U_{i}\right] =0, \\ \bar{X}_{n} &\rightarrow _{p}\mathrm{E}\left[X_{i}\right], \\ \frac{1}{n}\sum_{i=1}^{n}U_{i} &\rightarrow _{p}\mathrm{E}\left[U_{i}\right] =0. \end{aligned}
Hence,
\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &= \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i}-\bar{X}_{n}\left( \frac{1}{n}\sum_{i=1}^{n}U_{i}\right) \\ &\rightarrow _{p}0-\mathrm{E}\left[X_{i}\right] \cdot 0 = 0. \end{aligned}
Denominator converges to \mathrm{Var}\left(X_i\right)
First,
\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2} &= \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}^{2}-2\bar{X}_{n}X_{i}+\bar{X}_{n}^{2}\right) \\ &= \frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-2\bar{X}_{n}\frac{1}{n}\sum_{i=1}^{n}X_{i}+\bar{X}_{n}^{2} \\ &= \frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-2\bar{X}_{n}\bar{X}_{n}+\bar{X}_{n}^{2} = \frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-\bar{X}_{n}^{2}. \end{aligned}
By the LLN, \frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}\rightarrow _{p}\mathrm{E}\left[X_{i}^{2}\right] and \bar{X}_{n}\rightarrow _{p}\mathrm{E}\left[X_{i}\right].
By Slutsky’s Lemma, \bar{X}_{n}^{2}\rightarrow _{p}\left( \mathrm{E}\left[X_{i}\right]\right) ^{2}.
Thus, \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}=\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-\bar{X}_{n}^{2}\rightarrow _{p}\mathrm{E}\left[X_{i}^{2}\right] -\left( \mathrm{E}\left[X_{i}\right]\right) ^{2}=\mathrm{Var}\left(X_{i}\right).
Multiple regression
Under similar conditions to 1–4, one can establish consistency of OLS for the multiple linear regression model: Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\ldots +\beta _{k}X_{k,i}+U_{i}, where \mathrm{E}\left[U_{i}\right]=0.
The key assumption is that the errors and regressors are uncorrelated: \mathrm{E}\left[X_{1,i}U_{i}\right] =\ldots =\mathrm{E}\left[X_{k,i}U_{i}\right] =0.
Omitted variables and OLS inconsistency
Suppose that the true model has two regressors: \begin{aligned} &Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}, \\ &\mathrm{E}\left[X_{1,i}U_{i}\right] =\mathrm{E}\left[X_{2,i}U_{i}\right] =0. \end{aligned}
Suppose that the econometrician includes only X_{1} in the regression when estimating \beta _{1}:
\begin{aligned} \tilde{\beta}_{1,n} &= \frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &= \frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) \left( \beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}\right) }{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &= \beta _{1}+\beta _{2}\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}+\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}. \end{aligned}
Inconsistency proof: noise term
From the previous slide: \tilde{\beta}_{1,n}=\beta _{1}+\beta _{2}\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}.
As before,
\begin{aligned} \frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} &= \frac{\frac{1}{n}\sum_{i=1}^{n}X_{1,i}U_{i}-\bar{X}_{1,n}\bar{U}_{n}}{\frac{1}{n}\sum_{i=1}^{n}X_{1,i}^{2}-\bar{X}_{1,n}^{2}} \\ &\rightarrow _{p}\frac{0}{\mathrm{E}\left[X_{1,i}^{2}\right]-\left( \mathrm{E}\left[X_{1,i}\right]\right) ^{2}} \\ &= \frac{0}{\mathrm{Var}\left(X_{1,i}\right)}=0. \end{aligned}
Inconsistency proof: bias term
From the previous slide: \tilde{\beta}_{1,n}=\beta _{1}+\beta _{2}\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}.
However,
\begin{aligned} \frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} &= \frac{\frac{1}{n}\sum_{i=1}^{n}X_{1,i}X_{2,i}-\bar{X}_{1,n}\bar{X}_{2,n}}{\frac{1}{n}\sum_{i=1}^{n}X_{1,i}^{2}-\bar{X}_{1,n}^{2}} \\ &\rightarrow _{p}\frac{\mathrm{E}\left[X_{1,i}X_{2,i}\right] -\mathrm{E}\left[X_{1,i}\right] \mathrm{E}\left[X_{2,i}\right]}{\mathrm{E}\left[X_{1,i}^{2}\right]-\left( \mathrm{E}\left[X_{1,i}\right]\right) ^{2}} \\ &= \frac{\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)}{\mathrm{Var}\left(X_{1,i}\right)}. \end{aligned}
Omitted variable bias formula
We have
\begin{aligned} \tilde{\beta}_{1,n} &= \beta _{1}+\beta _{2}\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &\rightarrow _{p}\beta _{1}+\beta _{2}\frac{\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)}{\mathrm{Var}\left(X_{1,i}\right)}+\frac{0}{\mathrm{Var}\left(X_{1,i}\right)} \\ &= \beta _{1}+\beta _{2}\frac{\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)}{\mathrm{Var}\left(X_{1,i}\right)}. \end{aligned}
Thus, \tilde{\beta}_{1,n} is inconsistent unless:
\beta _{2}=0 (the model is correctly specified).
\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)=0 (the omitted variable is uncorrelated with the included regressor).
OVB through the composite error
In this example, the model contains two regressors: \begin{aligned} &Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}, \\ &\mathrm{E}\left[X_{1,i}U_{i}\right] =\mathrm{E}\left[X_{2,i}U_{i}\right] =0. \end{aligned}
However, since X_{2} is not controlled for, it goes into the error term: \begin{aligned} Y_{i} &= \beta _{0}+\beta _{1}X_{1,i}+V_{i},\text{ where} \\ V_{i} &= \beta _{2}X_{2,i}+U_{i}. \end{aligned}
For consistency of \tilde{\beta}_{1,n} we need \mathrm{Cov}\left(X_{1,i},V_{i}\right) = 0; however,
\begin{aligned} \mathrm{Cov}\left(X_{1,i},V_{i}\right) &= \mathrm{Cov}\left(X_{1,i},\beta _{2}X_{2,i}+U_{i}\right) \\ &= \mathrm{Cov}\left(X_{1,i},\beta _{2}X_{2,i}\right)+\mathrm{Cov}\left(X_{1,i},U_{i}\right) \\ &= \beta _{2}\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)+0 \\ &\neq 0\text{, unless }\beta _{2}=0\text{ or }\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)=0. \end{aligned}
Part II: Asymptotic Normality
Why do we need asymptotic normality?
In the previous lectures, we showed that the OLS estimator has an exact normal distribution when the errors are normally distributed.
- The same assumption is needed to show that the T statistic has a t-distribution and the F statistic has an F-distribution.
In this lecture, we argue that even when the errors are not normally distributed, the OLS estimator has an approximately normal distribution in large samples, provided that some additional conditions hold.
- This property is used for hypothesis testing: in large samples, the T statistic has a standard normal distribution and the F statistic has a \chi^{2} distribution (approximately).
Asymptotic normality
Let W_{n} be a sequence of random variables indexed by the sample size n.
- Typically, W_{n} will be a function of some estimator, such as W_{n}=\sqrt{n}\left( \hat{\beta}_{n}-\beta \right).
We say that W_{n} has an asymptotically normal distribution if its CDF converges to a normal CDF.
Let W be any random variable with a normal N\left( 0,\sigma^{2}\right) distribution. We say that W_{n} has an asymptotically normal distribution if for all x\in \mathbb{R}:
F_{n}\left( x\right) =P\left( W_{n}\leq x\right) \rightarrow P\left( W\leq x\right) =F\left( x\right) \text{ as }n\rightarrow \infty .
- We denote this as W_{n}\rightarrow _{d}W or W_{n}\rightarrow _{d}N\left( 0,\sigma ^{2}\right).
Convergence in distribution
Asymptotic normality is an example of convergence in distribution.
We say that a sequence of random variables W_{n} converges in distribution to W (denoted as W_{n}\rightarrow _{d}W) if the CDF of W_{n} converges to the CDF of W at all points where the CDF of W is continuous.
Convergence in distribution is convergence of the CDFs.
Central Limit Theorem (CLT)
An example of convergence in distribution is a CLT.
Let X_{1},\ldots ,X_{n} be a sample of iid random variables such that \mathrm{E}\left[X_{i}\right] =0 and \mathrm{Var}\left(X_{i}\right) =\sigma ^{2}>0 (finite). Then, as n\rightarrow \infty,
\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i}\rightarrow _{d}N\left( 0,\sigma^{2}\right) .
CLT with non-zero mean
For the CLT we impose 3 assumptions: (1) iid; (2) Mean zero; (3) Finite variance different from zero.
If X_{1},\ldots ,X_{n} are iid but \mathrm{E}\left[X_{i}\right] =\mu \neq 0, then consider X_{i}-\mu. Since \mathrm{E}\left[X_{i}-\mu\right] =0, we have
\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \rightarrow_{d}N\left( 0,\mathrm{Var}\left(X_{i}\right) \right) .
Then
\begin{aligned} \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) &= \sqrt{n}\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \\ &= \sqrt{n}\left( \frac{1}{n}\sum_{i=1}^{n}X_{i}-\frac{1}{n}\sum_{i=1}^{n}\mu \right) \\ &= \sqrt{n}\left( \bar{X}_{n}-\mu \right) . \end{aligned}
CLT for the sample average
From the previous slide: \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) = \sqrt{n}\left( \bar{X}_{n}-\mu \right) .
Thus, the CLT can be stated as
\sqrt{n}\left( \bar{X}_{n}-\mu \right) \rightarrow _{d}N\left( 0,\mathrm{Var}\left(X_{i}\right) \right) .
By the LLN,
\bar{X}_{n}-\mu \rightarrow _{p}0,
and
\mathrm{Var}\left(\sqrt{n}\left( \bar{X}_{n}-\mu \right)\right) = n\mathrm{Var}\left(\bar{X}_{n}\right) = n\frac{\mathrm{Var}\left(X_{i}\right)}{n} = \mathrm{Var}\left(X_{i}\right).
Properties
Suppose that W_{n}\rightarrow _{d}N\left( 0,\sigma ^{2}\right) and \theta _{n}\rightarrow _{p}\theta. Then,
\theta _{n}W_{n}\rightarrow _{d}\theta N\left( 0,\sigma ^{2}\right) \stackrel{d}{=} N\left( 0,\theta ^{2}\sigma ^{2}\right) ,
and
\theta _{n}+W_{n}\rightarrow _{d}\theta +N\left( 0,\sigma ^{2}\right) \stackrel{d}{=} N\left( \theta ,\sigma ^{2}\right) .
Suppose that Z_{n}\rightarrow _{d}Z\sim N\left( 0,1\right). Then,
Z_{n}^{2}\rightarrow _{d}Z^{2}\equiv \chi _{1}^{2}.
If W_{n}\rightarrow _{d}c= constant, then W_{n}\rightarrow _{p}c.
Asymptotic normality of OLS
Suppose that:
- The data \left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\} are iid.
- Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}, where \mathrm{E}\left[U_{i}\right] =0.
- \mathrm{E}\left[X_{i}U_{i}\right] =0.
- 0<\mathrm{Var}\left(X_{i}\right) <\infty.
- 0<\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] <\infty and 0<\mathrm{E}\left[U_{i}^{2}\right] <\infty.
Let \hat{\beta}_{1,n} be the OLS estimator of \beta _{1}. Then,
\sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \rightarrow _{d}N\left( 0,\frac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}}\right).
V=\dfrac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}} is called the asymptotic variance of \hat{\beta}_{1,n}.
Large-sample approximation for OLS
Let \overset{a}{\sim} denote “approximately in large samples.”
The asymptotic normality
\sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \rightarrow _{d}N\left(0,V\right)
can be viewed as the following large-sample approximation:
\sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \overset{a}{\sim} N\left(0,V\right) ,
or
\hat{\beta}_{1,n}\overset{a}{\sim} N\left( \beta _{1},V/n\right) .
Proof: decomposition
Write
\hat{\beta}_{1,n}=\beta _{1}+\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}.
Now
\hat{\beta}_{1,n}-\beta _{1}=\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}},
and
\sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) =\frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}.
Proof: combining the limits
\sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) =\frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}.
In Part I, we established
\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}\rightarrow_{p}\mathrm{Var}\left(X_{i}\right).
We will show that
\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}\rightarrow _{d}N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right] \right),
so that
\begin{align*} \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) &= \frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \\ &\rightarrow _{d}\frac{N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] \right)}{\mathrm{Var}\left(X_{i}\right)} \\ &\stackrel{d}{=} N\left( 0,\frac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}}\right). \end{align*}
Proof: numerator CLT
\begin{aligned} \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &= \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]+\mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) U_{i} \\ &= \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}+\left( \mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) \frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}. \end{aligned}
We have
\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}\right] = \mathrm{E}\left[X_{i}U_{i}\right]-\mathrm{E}\left[X_{i}\right]\mathrm{E}\left[U_{i}\right]=0,
and 0<\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] <\infty, so by the CLT,
\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}\rightarrow_{d}N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]\right).
Proof: second term vanishes
It is left to show that
\left( \mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) \frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}\rightarrow _{p}0.
We have \mathrm{E}\left[U_{i}\right]=0 and 0<\mathrm{E}\left[U_{i}^{2}\right]<\infty. By the CLT,
\frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}\rightarrow _{d}N\left(0,\mathrm{E}\left[U_{i}^{2}\right]\right).
By the LLN,
\mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\rightarrow _{p}0.
Hence, the result follows.
Part III: Asymptotic Variance
Asymptotic variance
In Part II, we showed that when the data are iid and the regressors are exogenous, \begin{aligned} Y_{i} &= \beta_{0} + \beta_{1}X_{i} + U_{i}, \\ \mathrm{E}\left[U_{i}\right] &= \mathrm{E}\left[X_{i}U_{i}\right] = 0, \end{aligned} the OLS estimator of \beta_{1} is asymptotically normal: \begin{aligned} \sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) &\rightarrow_{d} N\left(0, V\right), \\ V &= \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}. \end{aligned}
For hypothesis testing, we need a consistent estimator of the asymptotic variance V: \hat{V}_{n} \rightarrow_{p} V.
Simplifying V under homoskedasticity
Assume that the errors are homoskedastic: \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right] = \sigma^{2} \text{ for all } X_{i}\text{'s.}
In this case, the asymptotic variance can be simplified using the Law of Iterated Expectation: \begin{aligned} \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right] &= \mathrm{E}\left[\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2} \mid X_{i}\right]\right] \\ &= \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2} \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right]\right] \\ &= \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2} \sigma^{2}\right] \\ &= \sigma^{2}\,\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}\right] = \sigma^{2}\mathrm{Var}\left(X_{i}\right). \end{aligned}
Estimating V: method of moments
Thus, when the errors are homoskedastic with \mathrm{E}\left[U_{i}^{2}\right] = \sigma^{2}, V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} = \frac{\sigma^{2}\mathrm{Var}\left(X_{i}\right)}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}.
Let \hat{U}_{i} = Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i}, where \hat{\beta}_{0,n} and \hat{\beta}_{1,n} are the OLS estimators of \beta_{0} and \beta_{1}.
A consistent estimator for the asymptotic variance can be constructed using the Method of Moments: \begin{aligned} \hat{\sigma}_{n}^{2} &= \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2}, \\ \widehat{\mathrm{Var}}\left(X_{i}\right) &= \frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}, \text{ and} \\ \hat{V}_{n} &= \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}. \end{aligned}
Why the LLN does not apply directly
From the previous slide: \hat{V}_{n} = \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}},\quad \hat{\sigma}_{n}^{2} = \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2},\quad \hat{U}_{i} = Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i}.
When proving the consistency of OLS (Part I), we showed that \frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2} \rightarrow_{p} \mathrm{Var}\left(X_{i}\right), and to establish \hat{V}_{n} \rightarrow_{p} V, we need to show that \hat{\sigma}_{n}^{2} \rightarrow_{p} \sigma^{2}.
The LLN cannot be applied directly to \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2} because the \hat{U}_{i}’s are not iid: they are dependent through \hat{\beta}_{0,n} and \hat{\beta}_{1,n}.
Proof of \hat{\sigma}^{2}_{n} \rightarrow_{p} \sigma^{2}
First, write \begin{aligned} \hat{U}_{i} &= Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i} \\ &= \left(\beta_{0} + \beta_{1}X_{i} + U_{i}\right) - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i} \\ &= U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}. \end{aligned}
Now, \hat{\sigma}_{n}^{2} = \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2} = \frac{1}{n}\sum_{i=1}^{n}\left(U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}\right)^{2}.
Completing the consistency proof
We have \begin{aligned} \hat{\sigma}_{n}^{2} &= \frac{1}{n}\sum_{i=1}^{n}\left(U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}\right)^{2} \\ &= \frac{1}{n}\sum_{i=1}^{n}U_{i}^{2} + \left(\hat{\beta}_{0,n} - \beta_{0}\right)^{2} + \left(\hat{\beta}_{1,n} - \beta_{1}\right)^{2}\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2} \\ &\quad -2\left(\hat{\beta}_{0,n} - \beta_{0}\right)\frac{1}{n}\sum_{i=1}^{n}U_{i} - 2\left(\hat{\beta}_{1,n} - \beta_{1}\right)\frac{1}{n}\sum_{i=1}^{n}U_{i}X_{i} \\ &\quad +2\left(\hat{\beta}_{0,n} - \beta_{0}\right)\left(\hat{\beta}_{1,n} - \beta_{1}\right)\frac{1}{n}\sum_{i=1}^{n}X_{i}. \end{aligned}
By the LLN, \frac{1}{n}\sum_{i=1}^{n}U_{i}^{2} \rightarrow_{p} \mathrm{E}\left[U_{i}^{2}\right] = \sigma^{2}.
Because \hat{\beta}_{0,n} and \hat{\beta}_{1,n} are consistent, \hat{\beta}_{0,n} - \beta_{0} \rightarrow_{p} 0 \text{ and } \hat{\beta}_{1,n} - \beta_{1} \rightarrow_{p} 0.
Using s^2 instead of \hat{\sigma}^2_n
Thus, when the errors are homoskedastic, \hat{V}_{n} = \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}, \text{ with } \hat{\sigma}_{n}^{2} = \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2}, is a consistent estimator of V = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}.
Similarly, s^{2} = \frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2} \rightarrow_{p} \sigma^{2}, and therefore \hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} is also a consistent estimator of V = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}.
This version has an advantage over the one with \hat{\sigma}_{n}^{2}: in addition to being consistent, s^{2} is also an unbiased estimator of \sigma^{2} if the regressors are strongly exogenous.
Asymptotic approximation
The result \sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) \rightarrow_{d} N\left(0, V\right) is used as the following approximation: \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{V}{n}\right), where \overset{a}{\sim} denotes approximately in large samples. Thus, the variance of \hat{\beta}_{1,n} can be taken as approximately V/n.
With \hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} we have \frac{\hat{V}_{n}}{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} \cdot \frac{1}{n} = \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}.
Connection to the exact result
From the previous slide: \frac{\hat{V}_{n}}{n} = \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}
Thus, in the case of homoskedastic errors we have the following asymptotic approximation: \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}\right).
In finite samples, we have the same result exactly, when the regressors are strongly exogenous and the errors are normal.
Asymptotic T-test
Consider testing H_{0}: \beta_{1} = \beta_{1,0} vs H_{1}: \beta_{1} \neq \beta_{1,0}.
Consider the behavior of the T statistic under H_{0}: \beta_{1} = \beta_{1,0}. Since \sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) \rightarrow_{d} N\left(0, V\right) \text{ and } \hat{V}_{n} \rightarrow_{p} V, we have \begin{aligned} T = \frac{\hat{\beta}_{1,n} - \beta_{1,0}}{\sqrt{\hat{V}_{n}/n}} &= \frac{\sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1,0}\right)}{\sqrt{\hat{V}_{n}}} \\ &\overset{H_{0}}{=} \frac{\sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right)}{\sqrt{\hat{V}_{n}}} \\ &\rightarrow_{d} \frac{N\left(0, V\right)}{\sqrt{V}} \stackrel{d}{=} N\left(0, 1\right). \end{aligned}
Asymptotic T-test: rejection rule
Under H_{0}: \beta_{1} = \beta_{1,0}, T = \frac{\hat{\beta}_{1,n} - \beta_{1,0}}{\sqrt{\hat{V}_{n}/n}} \rightarrow_{d} N\left(0, 1\right), provided that \hat{V}_{n} \rightarrow_{p} V (the asymptotic variance of \hat{\beta}_{1,n}).
An asymptotic size \alpha test rejects H_{0}: \beta_{1} = \beta_{1,0} against H_{1}: \beta_{1} \neq \beta_{1,0} when \left|T\right| > z_{1-\alpha/2}, where z_{1-\alpha/2} is a standard normal critical value.
Asymptotically, the variance of the OLS estimator is known; we behave as if the variance were known.
Heteroskedastic errors
In general, the errors are heteroskedastic: \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right] is not constant and changes with X_{i}.
In this case, \hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} is not a consistent estimator of the asymptotic variance V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}: \begin{aligned} \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} &\rightarrow_{p} \frac{\mathrm{E}\left[U_{i}^{2}\right]}{\mathrm{Var}\left(X_{i}\right)} \\ &= \frac{\mathrm{Var}\left(X_{i}\right)\cdot\mathrm{E}\left[U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} \\ &\neq \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}. \end{aligned}
HC estimator of asymptotic variance
In the case of heteroskedastic errors, a consistent estimator of V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} can be constructed as follows: \hat{V}_{n}^{HC} = \frac{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}\hat{U}_{i}^{2}}{\left(\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}\right)^{2}}.
One can show that \hat{V}_{n}^{HC} \rightarrow_{p} V whether the errors are heteroskedastic or homoskedastic.
We have the following asymptotic approximation: \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{\hat{V}_{n}^{HC}}{n}\right), and the standard errors can be computed as \mathrm{se}\left(\hat{\beta}_{1,n}\right) = \sqrt{\hat{V}_{n}^{HC}/n}.
HC standard errors in R
In R, the HC estimator of standard errors can be obtained using the
sandwichpackage:Standard (homoskedastic) standard errors:
coeftest(reg)t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.872735 0.728964 -3.9408 9.225e-05 *** educ 0.598965 0.051284 11.6795 < 2.2e-16 *** exper 0.022340 0.012057 1.8528 0.06447 . tenure 0.169269 0.021645 7.8204 2.935e-14 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1HC (robust) standard errors:
coeftest(reg, vcov = vcovHC(reg, type = "HC1"))t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.872735 0.807415 -3.5579 0.0004078 *** educ 0.598965 0.061014 9.8169 < 2.2e-16 *** exper 0.022340 0.010555 2.1165 0.0347731 * tenure 0.169269 0.029278 5.7814 1.277e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1