Lecture 17: Asymptotics
Economics 326 — Introduction to Econometrics II
Why we need large-sample theory
The OLS estimator \(\hat{\beta}\) has desirable properties:
\(\hat{\beta}\) is unbiased if the errors are strongly exogenous: \(\mathrm{E}\left[U_i \mid \mathbf{X}\right] =0.\)
If in addition the errors are homoskedastic, then \(\widehat{\mathrm{Var}}\left(\hat{\beta}\right)=s^{2}/\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}\) is an unbiased estimator of the conditional variance of \(\hat{\beta}\).
If in addition the errors are normally distributed (given \(\mathbf{X}\)), then \(T=\left( \hat{\beta}-\beta \right) /\sqrt{\widehat{\mathrm{Var}}\left(\hat{\beta}\right)}\) has a \(t\) distribution which can be used for hypothesis testing.
Limitations of finite-sample theory
If the errors are only weakly exogenous: \[ \mathrm{E}\left[X_{i}U_{i}\right] =0, \] the OLS estimator is in general biased.
If the errors are heteroskedastic: \[ \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right] =h\left( X_{i}\right), \] the “usual” variance formula is invalid; we also do not have an unbiased estimator for the variance in this case.
If the errors are not normally distributed conditional on \(\mathbf{X}\), then \(T\)- and \(F\)-statistics do not have \(t\) and \(F\) distributions under the null hypothesis.
Asymptotic (large-sample) theory allows us to derive approximate properties and distributions of estimators and test statistics by assuming that the sample size \(n\) is very large.
Part I: Consistency
Convergence of a sequence
A sequence of real numbers \(a_{1}, a_{2}, \ldots\) converges to \(a\) if for every \(\varepsilon > 0\) there exists \(N\) such that \(|a_{n} - a| < \varepsilon\) for all \(n \geq N\). We write \(a_{n} \to a\).
Since \(\varepsilon_{1} > \varepsilon_{2}\), the \(\varepsilon_{2}\)-band is narrower, so it takes more terms for the sequence to stay inside it: \(N_{2} > N_{1}\). Smaller \(\varepsilon\) requires larger \(N\).
A sequence that does not converge: \(a_{n} = a + c\sin(n)\) oscillates indefinitely around \(a\).
For \(\varepsilon_{1} > c\), all terms lie within the \(\varepsilon_{1}\)-band. But for \(\varepsilon_{2} < c\), terms keep falling outside the \(\varepsilon_{2}\)-band (red dots) no matter how far along the sequence we go. Convergence requires the condition to hold for all \(\varepsilon > 0\), so the sequence does not converge.
Our estimator \(\hat{\beta}_{n}\) is random: its value changes with each sample. To apply the concept of convergence, we need to convert it into a non-random sequence indexed by \(n\).
We take \(a_{n} = P\left(\left\vert \hat{\beta}_{n}-\beta \right\vert \geq \varepsilon \right)\), which is a non-random number for each \(n\). We say \(\hat{\beta}_{n}\) converges in probability to \(\beta\) if \(a_{n} \to 0\) for all \(\varepsilon > 0\).
Convergence in probability and LLN
More generally, let \(\theta _{n}\) be a sequence of random variables indexed by the sample size \(n.\) We say that \(\theta _{n}\) converges in probability to \(\theta\) if \[ \lim_{n\rightarrow \infty }P\left( \left\vert \theta _{n}-\theta \right\vert \geq \varepsilon \right) =0\text{ for all }\varepsilon >0. \]
We denote this as \(\theta _{n}\rightarrow _{p}\theta\) or \(p\lim \theta _{n}=\theta.\)
An example of convergence in probability is a Law of Large Numbers (LLN):
Let \(X_{1},X_{2},\ldots ,X_{n}\) be a random sample such that \(\mathrm{E}\left[X_{i}\right] =\mu\) for all \(i=1,\ldots ,n,\) and define \(\bar{X}_{n}=\frac{1}{n}\sum_{i=1}^{n}X_{i}.\) Then, under certain conditions, \[ \bar{X}_{n}\rightarrow _{p}\mu. \]
LLN
Let \(X_{1},\ldots ,X_{n}\) be a sample of independent identically distributed (iid) random variables. Let \(\mathrm{E}\left[X_{i}\right]=\mu\). If \(\mathrm{Var}\left(X_{i}\right)=\sigma ^{2}<\infty\), then \[ \bar{X}_{n}\rightarrow _{p}\mu. \]
In fact when the data are iid, the LLN holds if \[ \mathrm{E}\left[\left\vert X_{i}\right\vert\right] <\infty, \] but we prove the result under a stronger assumption that \(\mathrm{Var}\left(X_{i}\right)<\infty.\)
Markov’s inequality
Markov’s inequality. Let \(W\) be a random variable. For \(\varepsilon >0\) and \(r>0\), \[ P\left( \left\vert W\right\vert \geq \varepsilon \right) \leq \frac{\mathrm{E}\left[\left\vert W\right\vert ^{r}\right]}{\varepsilon ^{r}}. \]
With \(r=2,\) we have Chebyshev’s inequality. Suppose that \(\mathrm{E}\left[X\right]=\mu.\) Take \(W\equiv X-\mu\) and apply Markov’s inequality with \(r=2\). For \(\varepsilon >0,\)
\[ \begin{aligned} P\left( \left\vert X-\mu \right\vert \geq \varepsilon \right) &\leq \frac{\mathrm{E}\left[\left\vert X-\mu \right\vert ^{2}\right]}{\varepsilon ^{2}} \\ &= \frac{\mathrm{Var}\left(X\right)}{\varepsilon ^{2}}. \end{aligned} \]
The probability of observing an outlier (a large deviation of \(X\) from its mean \(\mu\)) can be bounded by the variance.
Proof of Markov’s inequality
For any event \(A\), the expectation of its indicator equals the probability of the event: \[ \mathrm{E}\left[\mathbf{1}(A)\right] = 1 \cdot P(A) + 0 \cdot P(A^c) = P(A). \]
Define the indicator \(\mathbf{1}\left(\left\vert W \right\vert \geq \varepsilon\right)\), which equals \(1\) when \(\left\vert W \right\vert \geq \varepsilon\) and \(0\) otherwise. Then:
\[ \begin{aligned} &\mathbf{1}\left(\left\vert W \right\vert \geq \varepsilon\right)\\ &\fragment{{}\quad= \mathbf{1}\left(\left\vert W \right\vert^r \geq \varepsilon^r\right)} \\ &\fragment{{}\quad= \mathbf{1}\!\left(\frac{\left\vert W \right\vert^r}{\varepsilon^r} \geq 1\right)} \\ &\fragment{{}\quad\leq \frac{\left\vert W \right\vert^r}{\varepsilon^r}} \\ &\fragment{{}\Longrightarrow }\\ &\fragment{{} P\left(\left\vert W \right\vert \geq \varepsilon\right) = \mathrm{E}\left[\mathbf{1}\left(\left\vert W \right\vert \geq \varepsilon\right)\right] \leq \frac{\mathrm{E}\left[\left\vert W \right\vert^r\right]}{\varepsilon^r}.} \end{aligned} \]
Proof of the LLN
- Fix \(\varepsilon >0\) and apply Markov’s inequality with \(r=2:\)
\[ \begin{aligned} P\left( \left\vert \bar{X}_{n}-\mu \right\vert \geq \varepsilon \right) &\fragment{{}= P\left( \left\vert \frac{1}{n}\sum_{i=1}^{n}X_{i}-\mu \right\vert \geq \varepsilon \right)} \\ &\fragment{{}= P\left( \left\vert \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \right\vert \geq \varepsilon \right)} \\ &\fragment{{}\leq \frac{\mathrm{E}\left[\left( \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \right) ^{2}\right]}{\varepsilon ^{2}}} \\ &\fragment{{}= \frac{1}{n^{2}\varepsilon ^{2}}\left( \sum_{i=1}^{n}\mathrm{E}\left[\left( X_{i}-\mu \right) ^{2}\right]+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{E}\left[\left( X_{i}-\mu \right) \left( X_{j}-\mu \right)\right] \right)} \\ &\fragment{{}= \frac{1}{n^{2}\varepsilon ^{2}}\left( \sum_{i=1}^{n}\mathrm{Var}\left(X_{i}\right)+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{Cov}\left(X_{i},X_{j}\right)\right)} \\ &\fragment{{}= \frac{n\sigma ^{2}}{n^{2}\varepsilon ^{2}} = \frac{\sigma ^{2}}{n\varepsilon ^{2}} \rightarrow 0 \text{ as }n\rightarrow \infty \text{ for all }\varepsilon >0.} \end{aligned} \]
Averaging and variance reduction
Let \(X_{1},\ldots ,X_{n}\) be a sample and suppose that
\[ \begin{aligned} \mathrm{E}\left[X_{i}\right] &= \mu \text{ for all }i=1,\ldots ,n, \\ \mathrm{Var}\left(X_{i}\right) &= \sigma ^{2}\text{ for all }i=1,\ldots ,n, \\ \mathrm{Cov}\left(X_{i},X_{j}\right) &= 0\text{ for all }j\neq i. \end{aligned} \]
The mean of the sample average:
\[ \begin{aligned} \mathrm{E}\left[\bar{X}_{n}\right] &= \mathrm{E}\left[\frac{1}{n}\sum_{i=1}^{n}X_{i}\right] \\ &= \frac{1}{n}\sum_{i=1}^{n}\mathrm{E}\left[X_{i}\right] \\ &= \frac{1}{n}\sum_{i=1}^{n}\mu = \frac{1}{n}n\mu =\mu. \end{aligned} \]
Variance of the sample average
The variance of the sample average:
\[ \begin{aligned} \mathrm{Var}\left(\bar{X}_{n}\right) &= \mathrm{Var}\left(\frac{1}{n}\sum_{i=1}^{n}X_{i}\right) \\ &= \frac{1}{n^{2}}\mathrm{Var}\left(\sum_{i=1}^{n}X_{i}\right) \\ &= \frac{1}{n^{2}}\left( \sum_{i=1}^{n}\mathrm{Var}\left(X_{i}\right)+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{Cov}\left(X_{i},X_{j}\right)\right) \\ &= \frac{1}{n^{2}}\left( \sum_{i=1}^{n}\sigma ^{2}+\sum_{i=1}^{n}\sum_{j\neq i}0\right) \\ &= \frac{1}{n^{2}}n\sigma ^{2}=\frac{\sigma ^{2}}{n}. \end{aligned} \]
The variance of the average approaches zero as \(n\rightarrow \infty\) if the observations are uncorrelated.
Convergence in probability: properties
Slutsky’s Lemma. Suppose that \(\theta _{n}\rightarrow _{p}\theta,\) and let \(g\) be a function continuous at \(\theta.\) Then, \[ g\left( \theta _{n}\right) \rightarrow _{p}g\left( \theta \right). \]
If \(\theta _{n}\rightarrow _{p}\theta,\) then \(\theta _{n}^{2}\rightarrow _{p}\theta ^{2}.\)
If \(\theta _{n}\rightarrow _{p}\theta\) and \(\theta \neq 0,\) then \(1/\theta _{n}\rightarrow _{p}1/\theta.\)
Suppose that \(\theta _{n}\rightarrow _{p}\theta\) and \(\lambda _{n}\rightarrow _{p}\lambda.\) Then,
\(\theta _{n}+\lambda _{n}\rightarrow _{p}\theta +\lambda.\)
\(\theta _{n}\lambda _{n}\rightarrow _{p}\theta \lambda.\)
\(\theta _{n}/\lambda _{n}\rightarrow _{p}\theta /\lambda\) provided that \(\lambda \neq 0.\)
Consistency
Let \(\hat{\beta}_{n}\) be an estimator of \(\beta\) based on a sample of size \(n.\)
We say that \(\hat{\beta}_{n}\) is a consistent estimator of \(\beta\) if as \(n\rightarrow \infty,\) \[ \hat{\beta}_{n}\rightarrow _{p}\beta. \]
Consistency means that the probability of the event that the distance between \(\hat{\beta}_{n}\) and \(\beta\) exceeds \(\varepsilon >0\) can be made arbitrarily small by increasing the sample size.
Consistency of OLS
Suppose that:
The data \(\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}\) are iid.
\(Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\) where \(\mathrm{E}\left[U_{i}\right] =0.\)
\(\mathrm{E}\left[X_{i}U_{i}\right] =0.\)
\(0<\mathrm{Var}\left(X_{i}\right)<\infty.\)
Let \(\hat{\beta}_{0,n}\) and \(\hat{\beta}_{1,n}\) be the OLS estimators of \(\beta _{0}\) and \(\beta _{1}\) based on a sample of size \(n\). Under Assumptions 1–4, \[ \begin{aligned} \hat{\beta}_{0,n} &\rightarrow _{p}\beta _{0}, \\ \hat{\beta}_{1,n} &\rightarrow _{p}\beta _{1}. \end{aligned} \]
The key identifying assumption is Assumption 3: \(\mathrm{Cov}\left(X_{i},U_{i}\right)=0.\)
Proof of consistency
Write
\[ \begin{aligned} \hat{\beta}_{1,n} = \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} &= \beta _{1}+\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \\ &= \beta _{1}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \end{aligned} \]
We will show that \[ \begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &\rightarrow _{p}0, \\ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2} &\rightarrow _{p}\mathrm{Var}\left(X_{i}\right), \end{aligned} \]
Since \(\mathrm{Var}\left(X_{i}\right)\neq 0,\) \[ \hat{\beta}_{1,n} = \beta _{1}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \rightarrow _{p} \beta _{1}+\frac{0}{\mathrm{Var}\left(X_{i}\right)}= \beta _{1}. \]
Numerator converges to zero
\[ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} = \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i}-\bar{X}_{n}\left( \frac{1}{n}\sum_{i=1}^{n}U_{i}\right). \]
By the LLN,
\[ \begin{aligned} \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i} &\rightarrow _{p}\mathrm{E}\left[X_{i}U_{i}\right] =0, \\ \bar{X}_{n} &\rightarrow _{p}\mathrm{E}\left[X_{i}\right], \\ \frac{1}{n}\sum_{i=1}^{n}U_{i} &\rightarrow _{p}\mathrm{E}\left[U_{i}\right] =0. \end{aligned} \]
Hence,
\[ \begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &= \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i}-\bar{X}_{n}\left( \frac{1}{n}\sum_{i=1}^{n}U_{i}\right) \\ &\rightarrow _{p}0-\mathrm{E}\left[X_{i}\right] \cdot 0 = 0. \end{aligned} \]
Denominator converges to \(\mathrm{Var}\left(X_i\right)\)
The sample variance can be written as \[ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2} = \frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-\bar{X}_{n}^{2}. \]
By the LLN, \(\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}\rightarrow _{p}\mathrm{E}\left[X_{i}^{2}\right]\) and \(\bar{X}_{n}\rightarrow _{p}\mathrm{E}\left[X_{i}\right].\)
By Slutsky’s Lemma, \(\bar{X}_{n}^{2}\rightarrow _{p}\left( \mathrm{E}\left[X_{i}\right]\right) ^{2}.\)
Thus, \[ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}=\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-\bar{X}_{n}^{2}\rightarrow _{p}\mathrm{E}\left[X_{i}^{2}\right] -\left( \mathrm{E}\left[X_{i}\right]\right) ^{2}=\mathrm{Var}\left(X_{i}\right). \]
Multiple regression
Under similar conditions to 1–4, one can establish consistency of OLS for the multiple linear regression model: \[ Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\ldots +\beta _{k}X_{k,i}+U_{i}, \] where \(\mathrm{E}\left[U_{i}\right]=0.\)
The key assumption is that the errors and regressors are uncorrelated: \[ \mathrm{E}\left[X_{1,i}U_{i}\right] =\ldots =\mathrm{E}\left[X_{k,i}U_{i}\right] =0. \]
Omitted variables and OLS inconsistency
Suppose that the true model has two regressors: \[ \begin{aligned} &Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}, \\ &\mathrm{E}\left[X_{1,i}U_{i}\right] =\mathrm{E}\left[X_{2,i}U_{i}\right] =0. \end{aligned} \]
Suppose that the econometrician includes only \(X_{1}\) in the regression when estimating \(\beta _{1}\):
\[ \begin{aligned} \tilde{\beta}_{1,n} &= \frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &= \frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) \left( \beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}\right) }{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &= \beta _{1}+\beta _{2}\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &\quad +\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}. \end{aligned} \]
Dividing numerator and denominator by \(n\) and applying the LLN as before:
The noise term vanishes: \[ \frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \rightarrow _{p} \frac{\mathrm{Cov}\left(X_{1,i},U_{i}\right)}{\mathrm{Var}\left(X_{1,i}\right)} = 0. \]
The bias term converges: \[ \frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \rightarrow _{p} \frac{\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)}{\mathrm{Var}\left(X_{1,i}\right)}. \]
Therefore, \[ \tilde{\beta}_{1,n} \rightarrow _{p} \beta _{1}+\beta _{2}\frac{\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)}{\mathrm{Var}\left(X_{1,i}\right)}. \]
\(\tilde{\beta}_{1,n}\) is inconsistent unless:
\(\beta _{2}=0\) (the model is correctly specified).
\(\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)=0\) (the omitted variable is uncorrelated with the included regressor).
OVB through the composite error
In this example, the model contains two regressors: \[ \begin{aligned} &Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}, \\ &\mathrm{E}\left[X_{1,i}U_{i}\right] =\mathrm{E}\left[X_{2,i}U_{i}\right] =0. \end{aligned} \]
However, since \(X_{2}\) is not controlled for, it goes into the error term: \[ \begin{aligned} Y_{i} &= \beta _{0}+\beta _{1}X_{1,i}+V_{i},\text{ where} \\ V_{i} &= \beta _{2}X_{2,i}+U_{i}. \end{aligned} \]
For consistency of \(\tilde{\beta}_{1,n}\) we need \(\mathrm{Cov}\left(X_{1,i},V_{i}\right) = 0\); however,
\[ \begin{aligned} \mathrm{Cov}\left(X_{1,i},V_{i}\right) &= \mathrm{Cov}\left(X_{1,i},\beta _{2}X_{2,i}+U_{i}\right) \\ &= \mathrm{Cov}\left(X_{1,i},\beta _{2}X_{2,i}\right)+\mathrm{Cov}\left(X_{1,i},U_{i}\right) \\ &= \beta _{2}\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)+0 \\ &\neq 0\text{, unless }\beta _{2}=0\text{ or }\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)=0. \end{aligned} \]
Part II: Asymptotic Normality
Why do we need asymptotic normality?
In the previous lectures, we showed that the OLS estimator has an exact normal distribution when the errors are normally distributed.
- The same assumption is needed to show that the \(T\) statistic has a \(t\)-distribution and the \(F\) statistic has an \(F\)-distribution.
In this lecture, we argue that even when the errors are not normally distributed, the OLS estimator has an approximately normal distribution in large samples, provided that some additional conditions hold.
- This property is used for hypothesis testing: in large samples, the \(T\) statistic has a standard normal distribution and the \(F\) statistic has a \(\chi^{2}\) distribution (approximately).
Asymptotic normality
Let \(W_{n}\) be a sequence of random variables indexed by the sample size \(n.\)
- Typically, \(W_{n}\) will be a function of some estimator, such as \(W_{n}=\sqrt{n}\left( \hat{\beta}_{n}-\beta \right)\).
We say that \(W_{n}\) has an asymptotically normal distribution if its CDF converges to a normal CDF.
Let \(W\) be any random variable with a normal \(N\left( 0,\sigma^{2}\right)\) distribution and let \(F\) denote its CDF. We say that \(W_{n}\) has an asymptotically normal distribution if for all \(x\in \mathbb{R}\):
\[ F_{n}\left( x\right) =P\left( W_{n}\leq x\right) \rightarrow P\left( W\leq x\right) =F\left( x\right) \text{ as }n\rightarrow \infty . \]
- We denote this as \(W_{n}\rightarrow _{d}W\) or \(W_{n}\rightarrow _{d}N\left( 0,\sigma ^{2}\right).\)
Convergence in distribution
Asymptotic normality is an example of convergence in distribution.
We say that a sequence of random variables \(W_{n}\) converges in distribution to \(W\) (denoted as \(W_{n}\rightarrow _{d}W\)) if the CDF of \(W_{n}\) converges to the CDF of \(W\) at all points where the CDF of \(W\) is continuous.
Convergence in distribution is convergence of the CDFs.
Central Limit Theorem (CLT)
An example of convergence in distribution is a CLT.
Let \(X_{1},\ldots ,X_{n}\) be a sample of iid random variables such that \(\mathrm{E}\left[X_{i}\right] =0\) and \(\mathrm{Var}\left(X_{i}\right) =\sigma ^{2}>0\) (finite). Then, as \(n\rightarrow \infty,\)
\[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i}\rightarrow _{d}N\left( 0,\sigma^{2}\right) . \]
\(\rightarrow_{d}\) means that the CDF of the scaled sum converges to the normal CDF: for every \(x\), \[ P\left(\frac{1}{\sigma\sqrt{n}}\sum_{i=1}^{n}X_{i} \leq x\right) \rightarrow \Phi(x) \text{ as } n \rightarrow \infty, \] where \(\Phi\) is the standard normal CDF. For large \(n\), the distribution of the scaled sum is approximately normal.
CLT with non-zero mean
For the CLT we impose 3 assumptions: (1) iid; (2) Mean zero; (3) Finite variance different from zero.
If \(X_{1},\ldots ,X_{n}\) are iid but \(\mathrm{E}\left[X_{i}\right] =\mu \neq 0,\) then consider \(X_{i}-\mu.\) Since \(\mathrm{E}\left[X_{i}-\mu\right] =0,\) we have
\[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \rightarrow_{d}N\left( 0,\mathrm{Var}\left(X_{i}\right) \right) . \]
Then
\[ \begin{aligned} \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) &= \sqrt{n}\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \\ &= \sqrt{n}\left( \frac{1}{n}\sum_{i=1}^{n}X_{i}-\frac{1}{n}\sum_{i=1}^{n}\mu \right) \\ &= \sqrt{n}\left( \bar{X}_{n}-\mu \right) . \end{aligned} \]
CLT for the sample average
From the previous slide: \[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) = \sqrt{n}\left( \bar{X}_{n}-\mu \right) . \]
Thus, the CLT can be stated as
\[ \sqrt{n}\left( \bar{X}_{n}-\mu \right) \rightarrow _{d}N\left( 0,\mathrm{Var}\left(X_{i}\right) \right) . \]
By the LLN,
\[ \bar{X}_{n}-\mu \rightarrow _{p}0, \]
and
\[ \mathrm{Var}\left(\sqrt{n}\left( \bar{X}_{n}-\mu \right)\right) = n\mathrm{Var}\left(\bar{X}_{n}\right) = n\frac{\mathrm{Var}\left(X_{i}\right)}{n} = \mathrm{Var}\left(X_{i}\right). \]
Properties
Suppose that \(W_{n}\rightarrow _{d}N\left( 0,\sigma ^{2}\right)\) and \(\theta _{n}\rightarrow _{p}\theta.\) Then,
\[ \theta _{n}W_{n}\rightarrow _{d}\theta N\left( 0,\sigma ^{2}\right) \stackrel{d}{=} N\left( 0,\theta ^{2}\sigma ^{2}\right) , \]
and
\[ \theta _{n}+W_{n}\rightarrow _{d}\theta +N\left( 0,\sigma ^{2}\right) \stackrel{d}{=} N\left( \theta ,\sigma ^{2}\right) . \]
Suppose that \(Z_{n}\rightarrow _{d}Z\sim N\left( 0,1\right).\) Then,
\[ Z_{n}^{2}\rightarrow _{d}Z^{2}\equiv \chi _{1}^{2}. \]
If \(W_{n}\rightarrow _{d}c=\) constant, then \(W_{n}\rightarrow _{p}c.\)
Asymptotic normality of OLS
Suppose that:
- The data \(\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}\) are iid.
- \(Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\) where \(\mathrm{E}\left[U_{i}\right] =0.\)
- \(\mathrm{E}\left[X_{i}U_{i}\right] =0.\)
- \(0<\mathrm{Var}\left(X_{i}\right) <\infty.\)
- \(0<\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] <\infty\) and \(0<\mathrm{E}\left[U_{i}^{2}\right] <\infty.\)
Let \(\hat{\beta}_{1,n}\) be the OLS estimator of \(\beta _{1}.\) Then,
\[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \rightarrow _{d}N\left( 0,\frac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}}\right). \]
\(V=\dfrac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}}\) is called the asymptotic variance of \(\hat{\beta}_{1,n}.\)
Large-sample approximation for OLS
Let \(\overset{a}{\sim}\) denote “approximately in large samples.”
The asymptotic normality
\[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \rightarrow _{d}N\left(0,V\right) \]
can be viewed as the following large-sample approximation:
\[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \overset{a}{\sim} N\left(0,V\right) , \]
or
\[ \hat{\beta}_{1,n}\overset{a}{\sim} N\left( \beta _{1},V/n\right) . \]
Proof: decomposition
Write
\[ \hat{\beta}_{1,n}=\beta _{1}+\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \]
Now
\[ \hat{\beta}_{1,n}-\beta _{1}=\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}, \]
and
\[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) =\frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \]
Proof: combining the limits
\[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) =\frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \]
In Part I, we established
\[ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}\rightarrow_{p}\mathrm{Var}\left(X_{i}\right). \]
We will show that
\[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}\rightarrow _{d}N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right] \right), \]
so that
\[\begin{align*} \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) &= \frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \\ &\rightarrow _{d}\frac{N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] \right)}{\mathrm{Var}\left(X_{i}\right)} \\ &\stackrel{d}{=} N\left( 0,\frac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}}\right). \end{align*}\]
Proof: numerator CLT
\[ \begin{aligned} \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &= \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]+\mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) U_{i} \\ &= \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}+\left( \mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) \frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}. \end{aligned} \]
We have
\[ \mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}\right] = \mathrm{E}\left[X_{i}U_{i}\right]-\mathrm{E}\left[X_{i}\right]\mathrm{E}\left[U_{i}\right]=0, \]
and \(0<\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] <\infty,\) so by the CLT,
\[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}\rightarrow_{d}N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]\right). \]
Proof: second term vanishes
It is left to show that
\[ \left( \mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) \frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}\rightarrow _{p}0. \]
We have \(\mathrm{E}\left[U_{i}\right]=0\) and \(0<\mathrm{E}\left[U_{i}^{2}\right]<\infty.\) By the CLT,
\[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}\rightarrow _{d}N\left(0,\mathrm{E}\left[U_{i}^{2}\right]\right). \]
By the LLN,
\[ \mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\rightarrow _{p}0. \]
Hence, the result follows.
Part III: Asymptotic Variance
Asymptotic variance
In Part II, we showed that when the data are iid and the regressors are exogenous, \[ \begin{aligned} Y_{i} &= \beta_{0} + \beta_{1}X_{i} + U_{i}, \\ \mathrm{E}\left[U_{i}\right] &= \mathrm{E}\left[X_{i}U_{i}\right] = 0, \end{aligned} \] the OLS estimator of \(\beta_{1}\) is asymptotically normal: \[ \begin{aligned} \sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) &\rightarrow_{d} N\left(0, V\right), \\ V &= \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}. \end{aligned} \]
For hypothesis testing, we need a consistent estimator of the asymptotic variance \(V\): \[ \hat{V}_{n} \rightarrow_{p} V. \]
Simplifying \(V\) under homoskedasticity
Assume that the errors are homoskedastic: \[ \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right] = \sigma^{2} \text{ for all } X_{i}\text{'s.} \]
In this case, the asymptotic variance can be simplified using the Law of Iterated Expectation: \[ \begin{aligned} \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right] &= \mathrm{E}\left[\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2} \mid X_{i}\right]\right] \\ &= \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2} \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right]\right] \\ &= \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2} \sigma^{2}\right] \\ &= \sigma^{2}\,\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}\right] = \sigma^{2}\mathrm{Var}\left(X_{i}\right). \end{aligned} \]
Estimating \(V\): method of moments
Thus, when the errors are homoskedastic with \(\mathrm{E}\left[U_{i}^{2}\right] = \sigma^{2},\) \[ V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} = \frac{\sigma^{2}\mathrm{Var}\left(X_{i}\right)}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}. \]
Let \(\hat{U}_{i} = Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i}\), where \(\hat{\beta}_{0,n}\) and \(\hat{\beta}_{1,n}\) are the OLS estimators of \(\beta_{0}\) and \(\beta_{1}.\)
A consistent estimator for the asymptotic variance can be constructed using the Method of Moments: \[ \begin{aligned} \hat{\sigma}_{n}^{2} &= \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2}, \\ \widehat{\mathrm{Var}}\left(X_{i}\right) &= \frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}, \text{ and} \\ \hat{V}_{n} &= \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}. \end{aligned} \]
Why the LLN does not apply directly
From the previous slide: \[ \begin{aligned} \hat{V}_{n} &= \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}, \\ \hat{\sigma}_{n}^{2} &= \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2}, \\ \hat{U}_{i} &= Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i}. \end{aligned} \]
When proving the consistency of OLS (Part I), we showed that \[ \frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2} \rightarrow_{p} \mathrm{Var}\left(X_{i}\right), \] and to establish \(\hat{V}_{n} \rightarrow_{p} V,\) we need to show that \(\hat{\sigma}_{n}^{2} \rightarrow_{p} \sigma^{2}.\)
The LLN cannot be applied directly to \[ \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2} \] because the \(\hat{U}_{i}\)’s are not iid: they are dependent through \(\hat{\beta}_{0,n}\) and \(\hat{\beta}_{1,n}.\)
Proof: \(\hat{\sigma}^{2}_{n} \rightarrow_{p} \sigma^{2}\)
First, write \[ \begin{aligned} \hat{U}_{i} &= Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i} \\ &= \left(\beta_{0} + \beta_{1}X_{i} + U_{i}\right) - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i} \\ &= U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}. \end{aligned} \]
Now, \[ \hat{\sigma}_{n}^{2} = \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2} = \frac{1}{n}\sum_{i=1}^{n}\left(U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}\right)^{2}. \]
Completing the consistency proof
We have \[ \begin{aligned} \hat{\sigma}_{n}^{2} &= \frac{1}{n}\sum_{i=1}^{n}\left(U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}\right)^{2} \\ &= \frac{1}{n}\sum_{i=1}^{n}U_{i}^{2} + \left(\hat{\beta}_{0,n} - \beta_{0}\right)^{2} + \left(\hat{\beta}_{1,n} - \beta_{1}\right)^{2}\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2} \\ &\quad -2\left(\hat{\beta}_{0,n} - \beta_{0}\right)\frac{1}{n}\sum_{i=1}^{n}U_{i} - 2\left(\hat{\beta}_{1,n} - \beta_{1}\right)\frac{1}{n}\sum_{i=1}^{n}U_{i}X_{i} \\ &\quad +2\left(\hat{\beta}_{0,n} - \beta_{0}\right)\left(\hat{\beta}_{1,n} - \beta_{1}\right)\frac{1}{n}\sum_{i=1}^{n}X_{i}. \end{aligned} \]
By the LLN, \[ \frac{1}{n}\sum_{i=1}^{n}U_{i}^{2} \rightarrow_{p} \mathrm{E}\left[U_{i}^{2}\right] = \sigma^{2}. \]
Because \(\hat{\beta}_{0,n}\) and \(\hat{\beta}_{1,n}\) are consistent, \[ \hat{\beta}_{0,n} - \beta_{0} \rightarrow_{p} 0 \text{ and } \hat{\beta}_{1,n} - \beta_{1} \rightarrow_{p} 0. \]
Using \(s^2\) instead of \(\hat{\sigma}^2_n\)
Thus, when the errors are homoskedastic, \[ \hat{V}_{n} = \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}, \text{ with } \hat{\sigma}_{n}^{2} = \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2}, \] is a consistent estimator of \(V = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}.\)
Similarly, \[ s^{2} = \frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2} \rightarrow_{p} \sigma^{2}, \] and therefore \[ \hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} \] is also a consistent estimator of \(V = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}.\)
This version has an advantage over the one with \(\hat{\sigma}_{n}^{2}\): in addition to being consistent, \(s^{2}\) is also an unbiased estimator of \(\sigma^{2}\) if the regressors are strongly exogenous.
Asymptotic approximation
The result \(\sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) \rightarrow_{d} N\left(0, V\right)\) is used as the following approximation: \[ \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{V}{n}\right), \] where \(\overset{a}{\sim}\) denotes approximately in large samples. Thus, the variance of \(\hat{\beta}_{1,n}\) can be taken as approximately \(V/n.\)
With \(\hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}\) we have \[ \frac{\hat{V}_{n}}{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} \cdot \frac{1}{n} = \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}. \]
Connection to the exact result
From the previous slide: \[ \frac{\hat{V}_{n}}{n} = \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} \]
Thus, in the case of homoskedastic errors we have the following asymptotic approximation: \[ \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}\right). \]
In finite samples, we have the same result exactly, when the regressors are strongly exogenous and the errors are normal.
Asymptotic \(T\)-test
Consider testing \(H_{0}: \beta_{1} = \beta_{1,0}\) vs \(H_{1}: \beta_{1} \neq \beta_{1,0}.\)
Consider the behavior of the \(T\) statistic under \(H_{0}: \beta_{1} = \beta_{1,0}\). Since \[ \sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) \rightarrow_{d} N\left(0, V\right) \text{ and } \hat{V}_{n} \rightarrow_{p} V, \] we have \[ \begin{aligned} T = \frac{\hat{\beta}_{1,n} - \beta_{1,0}}{\sqrt{\hat{V}_{n}/n}} &= \frac{\sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1,0}\right)}{\sqrt{\hat{V}_{n}}} \\ &\overset{H_{0}}{=} \frac{\sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right)}{\sqrt{\hat{V}_{n}}} \\ &\rightarrow_{d} \frac{N\left(0, V\right)}{\sqrt{V}} \stackrel{d}{=} N\left(0, 1\right). \end{aligned} \]
Asymptotic \(T\)-test: rejection rule
Under \(H_{0}: \beta_{1} = \beta_{1,0},\) \[ T = \frac{\hat{\beta}_{1,n} - \beta_{1,0}}{\sqrt{\hat{V}_{n}/n}} \rightarrow_{d} N\left(0, 1\right), \] provided that \(\hat{V}_{n} \rightarrow_{p} V\) (the asymptotic variance of \(\hat{\beta}_{1,n}\)).
An asymptotic size \(\alpha\) test rejects \(H_{0}: \beta_{1} = \beta_{1,0}\) against \(H_{1}: \beta_{1} \neq \beta_{1,0}\) when \[ \left|T\right| > z_{1-\alpha/2}, \] where \(z_{1-\alpha/2}\) is a standard normal critical value.
Asymptotically, the variance of the OLS estimator is known; we behave as if the variance were known.
Heteroskedastic errors
In general, the errors are heteroskedastic: \(\mathrm{E}\left[U_{i}^{2} \mid X_{i}\right]\) is not constant and changes with \(X_{i}.\)
In this case, \(\hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}\) is not a consistent estimator of the asymptotic variance \(V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}\): \[ \begin{aligned} \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} &\rightarrow_{p} \frac{\mathrm{E}\left[U_{i}^{2}\right]}{\mathrm{Var}\left(X_{i}\right)} \\ &= \frac{\mathrm{Var}\left(X_{i}\right)\cdot\mathrm{E}\left[U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} \\ &\neq \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}. \end{aligned} \]
HC estimator of asymptotic variance
In the case of heteroskedastic errors, a consistent estimator of \(V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}\) can be constructed as follows: \[ \hat{V}_{n}^{HC} = \frac{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}\hat{U}_{i}^{2}}{\left(\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}\right)^{2}}. \]
One can show that \(\hat{V}_{n}^{HC} \rightarrow_{p} V\) whether the errors are heteroskedastic or homoskedastic.
We have the following asymptotic approximation: \[ \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{\hat{V}_{n}^{HC}}{n}\right), \] and the standard errors can be computed as \(\mathrm{se}\left(\hat{\beta}_{1,n}\right) = \sqrt{\hat{V}_{n}^{HC}/n}.\)
HC standard errors in R
In R, the HC estimator of standard errors can be obtained using the
sandwichpackage:library(wooldridge) library(lmtest) library(sandwich) data("wage1") reg <- lm(wage ~ educ + exper + tenure, data = wage1)Standard (homoskedastic) standard errors:
coeftest(reg)t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.872735 0.728964 -3.9408 9.225e-05 *** educ 0.598965 0.051284 11.6795 < 2.2e-16 *** exper 0.022340 0.012057 1.8528 0.06447 . tenure 0.169269 0.021645 7.8204 2.935e-14 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1HC (robust) standard errors:
coeftest(reg, vcov = vcovHC(reg, type = "HC1"))t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.872735 0.807415 -3.5579 0.0004078 *** educ 0.598965 0.061014 9.8169 < 2.2e-16 *** exper 0.022340 0.010555 2.1165 0.0347731 * tenure 0.169269 0.029278 5.7814 1.277e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1