Lecture 17: Asymptotics

Economics 326 — Introduction to Econometrics II

Vadim Marmer, UBC

Why we need large-sample theory

\(\newcommand{\fragment}[1]{\class{mjxfrag}{#1}}\)
  • The OLS estimator \(\hat{\beta}\) has desirable properties:

    • \(\hat{\beta}\) is unbiased if the errors are strongly exogenous: \(\mathrm{E}\left[U_i \mid \mathbf{X}\right] =0.\)

    • If in addition the errors are homoskedastic, then \(\widehat{\mathrm{Var}}\left(\hat{\beta}\right)=s^{2}/\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}\) is an unbiased estimator of the conditional variance of \(\hat{\beta}\).

    • If in addition the errors are normally distributed (given \(\mathbf{X}\)), then \(T=\left( \hat{\beta}-\beta \right) /\sqrt{\widehat{\mathrm{Var}}\left(\hat{\beta}\right)}\) has a \(t\) distribution which can be used for hypothesis testing.

Limitations of finite-sample theory

  • If the errors are only weakly exogenous: \[ \mathrm{E}\left[X_{i}U_{i}\right] =0, \] the OLS estimator is in general biased.

  • If the errors are heteroskedastic: \[ \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right] =h\left( X_{i}\right), \] the “usual” variance formula is invalid; we also do not have an unbiased estimator for the variance in this case.

  • If the errors are not normally distributed conditional on \(\mathbf{X}\), then \(T\)- and \(F\)-statistics do not have \(t\) and \(F\) distributions under the null hypothesis.

  • Asymptotic (large-sample) theory allows us to derive approximate properties and distributions of estimators and test statistics by assuming that the sample size \(n\) is very large.

Part I: Consistency

Convergence of a sequence

  • A sequence of real numbers \(a_{1}, a_{2}, \ldots\) converges to \(a\) if for every \(\varepsilon > 0\) there exists \(N\) such that \(|a_{n} - a| < \varepsilon\) for all \(n \geq N\). We write \(a_{n} \to a\).

    Since \(\varepsilon_{1} > \varepsilon_{2}\), the \(\varepsilon_{2}\)-band is narrower, so it takes more terms for the sequence to stay inside it: \(N_{2} > N_{1}\). Smaller \(\varepsilon\) requires larger \(N\).

  • A sequence that does not converge: \(a_{n} = a + c\sin(n)\) oscillates indefinitely around \(a\).

    For \(\varepsilon_{1} > c\), all terms lie within the \(\varepsilon_{1}\)-band. But for \(\varepsilon_{2} < c\), terms keep falling outside the \(\varepsilon_{2}\)-band (red dots) no matter how far along the sequence we go. Convergence requires the condition to hold for all \(\varepsilon > 0\), so the sequence does not converge.

  • Our estimator \(\hat{\beta}_{n}\) is random: its value changes with each sample. To apply the concept of convergence, we need to convert it into a non-random sequence indexed by \(n\).

  • We take \(a_{n} = P\left(\left\vert \hat{\beta}_{n}-\beta \right\vert \geq \varepsilon \right)\), which is a non-random number for each \(n\). We say \(\hat{\beta}_{n}\) converges in probability to \(\beta\) if \(a_{n} \to 0\) for all \(\varepsilon > 0\).

Convergence in probability and LLN

  • More generally, let \(\theta _{n}\) be a sequence of random variables indexed by the sample size \(n.\) We say that \(\theta _{n}\) converges in probability to \(\theta\) if \[ \lim_{n\rightarrow \infty }P\left( \left\vert \theta _{n}-\theta \right\vert \geq \varepsilon \right) =0\text{ for all }\varepsilon >0. \]

  • We denote this as \(\theta _{n}\rightarrow _{p}\theta\) or \(p\lim \theta _{n}=\theta.\)

  • An example of convergence in probability is a Law of Large Numbers (LLN):

    Let \(X_{1},X_{2},\ldots ,X_{n}\) be a random sample such that \(\mathrm{E}\left[X_{i}\right] =\mu\) for all \(i=1,\ldots ,n,\) and define \(\bar{X}_{n}=\frac{1}{n}\sum_{i=1}^{n}X_{i}.\) Then, under certain conditions, \[ \bar{X}_{n}\rightarrow _{p}\mu. \]

LLN

  • Let \(X_{1},\ldots ,X_{n}\) be a sample of independent identically distributed (iid) random variables. Let \(\mathrm{E}\left[X_{i}\right]=\mu\). If \(\mathrm{Var}\left(X_{i}\right)=\sigma ^{2}<\infty\), then \[ \bar{X}_{n}\rightarrow _{p}\mu. \]

  • In fact when the data are iid, the LLN holds if \[ \mathrm{E}\left[\left\vert X_{i}\right\vert\right] <\infty, \] but we prove the result under a stronger assumption that \(\mathrm{Var}\left(X_{i}\right)<\infty.\)

Markov’s inequality

  • Markov’s inequality. Let \(W\) be a random variable. For \(\varepsilon >0\) and \(r>0\), \[ P\left( \left\vert W\right\vert \geq \varepsilon \right) \leq \frac{\mathrm{E}\left[\left\vert W\right\vert ^{r}\right]}{\varepsilon ^{r}}. \]

  • With \(r=2,\) we have Chebyshev’s inequality. Suppose that \(\mathrm{E}\left[X\right]=\mu.\) Take \(W\equiv X-\mu\) and apply Markov’s inequality with \(r=2\). For \(\varepsilon >0,\)

    \[ \begin{aligned} P\left( \left\vert X-\mu \right\vert \geq \varepsilon \right) &\leq \frac{\mathrm{E}\left[\left\vert X-\mu \right\vert ^{2}\right]}{\varepsilon ^{2}} \\ &= \frac{\mathrm{Var}\left(X\right)}{\varepsilon ^{2}}. \end{aligned} \]

  • The probability of observing an outlier (a large deviation of \(X\) from its mean \(\mu\)) can be bounded by the variance.

Proof of Markov’s inequality

  • For any event \(A\), the expectation of its indicator equals the probability of the event: \[ \mathrm{E}\left[\mathbf{1}(A)\right] = 1 \cdot P(A) + 0 \cdot P(A^c) = P(A). \]

  • Define the indicator \(\mathbf{1}\left(\left\vert W \right\vert \geq \varepsilon\right)\), which equals \(1\) when \(\left\vert W \right\vert \geq \varepsilon\) and \(0\) otherwise. Then:

\[ \begin{aligned} &\mathbf{1}\left(\left\vert W \right\vert \geq \varepsilon\right)\\ &\fragment{{}\quad= \mathbf{1}\left(\left\vert W \right\vert^r \geq \varepsilon^r\right)} \\ &\fragment{{}\quad= \mathbf{1}\!\left(\frac{\left\vert W \right\vert^r}{\varepsilon^r} \geq 1\right)} \\ &\fragment{{}\quad\leq \frac{\left\vert W \right\vert^r}{\varepsilon^r}} \\ &\fragment{{}\Longrightarrow }\\ &\fragment{{} P\left(\left\vert W \right\vert \geq \varepsilon\right) = \mathrm{E}\left[\mathbf{1}\left(\left\vert W \right\vert \geq \varepsilon\right)\right] \leq \frac{\mathrm{E}\left[\left\vert W \right\vert^r\right]}{\varepsilon^r}.} \end{aligned} \]

Proof of the LLN

  • Fix \(\varepsilon >0\) and apply Markov’s inequality with \(r=2:\)

\[ \begin{aligned} P\left( \left\vert \bar{X}_{n}-\mu \right\vert \geq \varepsilon \right) &\fragment{{}= P\left( \left\vert \frac{1}{n}\sum_{i=1}^{n}X_{i}-\mu \right\vert \geq \varepsilon \right)} \\ &\fragment{{}= P\left( \left\vert \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \right\vert \geq \varepsilon \right)} \\ &\fragment{{}\leq \frac{\mathrm{E}\left[\left( \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \right) ^{2}\right]}{\varepsilon ^{2}}} \\ &\fragment{{}= \frac{1}{n^{2}\varepsilon ^{2}}\left( \sum_{i=1}^{n}\mathrm{E}\left[\left( X_{i}-\mu \right) ^{2}\right]+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{E}\left[\left( X_{i}-\mu \right) \left( X_{j}-\mu \right)\right] \right)} \\ &\fragment{{}= \frac{1}{n^{2}\varepsilon ^{2}}\left( \sum_{i=1}^{n}\mathrm{Var}\left(X_{i}\right)+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{Cov}\left(X_{i},X_{j}\right)\right)} \\ &\fragment{{}= \frac{n\sigma ^{2}}{n^{2}\varepsilon ^{2}} = \frac{\sigma ^{2}}{n\varepsilon ^{2}} \rightarrow 0 \text{ as }n\rightarrow \infty \text{ for all }\varepsilon >0.} \end{aligned} \]

Averaging and variance reduction

  • Let \(X_{1},\ldots ,X_{n}\) be a sample and suppose that

    \[ \begin{aligned} \mathrm{E}\left[X_{i}\right] &= \mu \text{ for all }i=1,\ldots ,n, \\ \mathrm{Var}\left(X_{i}\right) &= \sigma ^{2}\text{ for all }i=1,\ldots ,n, \\ \mathrm{Cov}\left(X_{i},X_{j}\right) &= 0\text{ for all }j\neq i. \end{aligned} \]

  • The mean of the sample average:

    \[ \begin{aligned} \mathrm{E}\left[\bar{X}_{n}\right] &= \mathrm{E}\left[\frac{1}{n}\sum_{i=1}^{n}X_{i}\right] \\ &= \frac{1}{n}\sum_{i=1}^{n}\mathrm{E}\left[X_{i}\right] \\ &= \frac{1}{n}\sum_{i=1}^{n}\mu = \frac{1}{n}n\mu =\mu. \end{aligned} \]

Variance of the sample average

  • The variance of the sample average:

    \[ \begin{aligned} \mathrm{Var}\left(\bar{X}_{n}\right) &= \mathrm{Var}\left(\frac{1}{n}\sum_{i=1}^{n}X_{i}\right) \\ &= \frac{1}{n^{2}}\mathrm{Var}\left(\sum_{i=1}^{n}X_{i}\right) \\ &= \frac{1}{n^{2}}\left( \sum_{i=1}^{n}\mathrm{Var}\left(X_{i}\right)+\sum_{i=1}^{n}\sum_{j\neq i}\mathrm{Cov}\left(X_{i},X_{j}\right)\right) \\ &= \frac{1}{n^{2}}\left( \sum_{i=1}^{n}\sigma ^{2}+\sum_{i=1}^{n}\sum_{j\neq i}0\right) \\ &= \frac{1}{n^{2}}n\sigma ^{2}=\frac{\sigma ^{2}}{n}. \end{aligned} \]

  • The variance of the average approaches zero as \(n\rightarrow \infty\) if the observations are uncorrelated.

Convergence in probability: properties

  • Slutsky’s Lemma. Suppose that \(\theta _{n}\rightarrow _{p}\theta,\) and let \(g\) be a function continuous at \(\theta.\) Then, \[ g\left( \theta _{n}\right) \rightarrow _{p}g\left( \theta \right). \]

    • If \(\theta _{n}\rightarrow _{p}\theta,\) then \(\theta _{n}^{2}\rightarrow _{p}\theta ^{2}.\)

    • If \(\theta _{n}\rightarrow _{p}\theta\) and \(\theta \neq 0,\) then \(1/\theta _{n}\rightarrow _{p}1/\theta.\)

  • Suppose that \(\theta _{n}\rightarrow _{p}\theta\) and \(\lambda _{n}\rightarrow _{p}\lambda.\) Then,

    • \(\theta _{n}+\lambda _{n}\rightarrow _{p}\theta +\lambda.\)

    • \(\theta _{n}\lambda _{n}\rightarrow _{p}\theta \lambda.\)

    • \(\theta _{n}/\lambda _{n}\rightarrow _{p}\theta /\lambda\) provided that \(\lambda \neq 0.\)

Consistency

  • Let \(\hat{\beta}_{n}\) be an estimator of \(\beta\) based on a sample of size \(n.\)

  • We say that \(\hat{\beta}_{n}\) is a consistent estimator of \(\beta\) if as \(n\rightarrow \infty,\) \[ \hat{\beta}_{n}\rightarrow _{p}\beta. \]

  • Consistency means that the probability of the event that the distance between \(\hat{\beta}_{n}\) and \(\beta\) exceeds \(\varepsilon >0\) can be made arbitrarily small by increasing the sample size.

Consistency of OLS

  • Suppose that:

    1. The data \(\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}\) are iid.

    2. \(Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\) where \(\mathrm{E}\left[U_{i}\right] =0.\)

    3. \(\mathrm{E}\left[X_{i}U_{i}\right] =0.\)

    4. \(0<\mathrm{Var}\left(X_{i}\right)<\infty.\)

  • Let \(\hat{\beta}_{0,n}\) and \(\hat{\beta}_{1,n}\) be the OLS estimators of \(\beta _{0}\) and \(\beta _{1}\) based on a sample of size \(n\). Under Assumptions 1–4, \[ \begin{aligned} \hat{\beta}_{0,n} &\rightarrow _{p}\beta _{0}, \\ \hat{\beta}_{1,n} &\rightarrow _{p}\beta _{1}. \end{aligned} \]

  • The key identifying assumption is Assumption 3: \(\mathrm{Cov}\left(X_{i},U_{i}\right)=0.\)

Proof of consistency

  • Write

    \[ \begin{aligned} \hat{\beta}_{1,n} = \frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} &= \beta _{1}+\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \\ &= \beta _{1}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \end{aligned} \]

  • We will show that \[ \begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &\rightarrow _{p}0, \\ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2} &\rightarrow _{p}\mathrm{Var}\left(X_{i}\right), \end{aligned} \]

  • Since \(\mathrm{Var}\left(X_{i}\right)\neq 0,\) \[ \hat{\beta}_{1,n} = \beta _{1}+\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \rightarrow _{p} \beta _{1}+\frac{0}{\mathrm{Var}\left(X_{i}\right)}= \beta _{1}. \]

Numerator converges to zero

\[ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} = \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i}-\bar{X}_{n}\left( \frac{1}{n}\sum_{i=1}^{n}U_{i}\right). \]

By the LLN,

\[ \begin{aligned} \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i} &\rightarrow _{p}\mathrm{E}\left[X_{i}U_{i}\right] =0, \\ \bar{X}_{n} &\rightarrow _{p}\mathrm{E}\left[X_{i}\right], \\ \frac{1}{n}\sum_{i=1}^{n}U_{i} &\rightarrow _{p}\mathrm{E}\left[U_{i}\right] =0. \end{aligned} \]

Hence,

\[ \begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &= \frac{1}{n}\sum_{i=1}^{n}X_{i}U_{i}-\bar{X}_{n}\left( \frac{1}{n}\sum_{i=1}^{n}U_{i}\right) \\ &\rightarrow _{p}0-\mathrm{E}\left[X_{i}\right] \cdot 0 = 0. \end{aligned} \]

Denominator converges to \(\mathrm{Var}\left(X_i\right)\)

  • The sample variance can be written as \[ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2} = \frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-\bar{X}_{n}^{2}. \]

  • By the LLN, \(\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}\rightarrow _{p}\mathrm{E}\left[X_{i}^{2}\right]\) and \(\bar{X}_{n}\rightarrow _{p}\mathrm{E}\left[X_{i}\right].\)

  • By Slutsky’s Lemma, \(\bar{X}_{n}^{2}\rightarrow _{p}\left( \mathrm{E}\left[X_{i}\right]\right) ^{2}.\)

  • Thus, \[ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}=\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-\bar{X}_{n}^{2}\rightarrow _{p}\mathrm{E}\left[X_{i}^{2}\right] -\left( \mathrm{E}\left[X_{i}\right]\right) ^{2}=\mathrm{Var}\left(X_{i}\right). \]

Multiple regression

  • Under similar conditions to 1–4, one can establish consistency of OLS for the multiple linear regression model: \[ Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\ldots +\beta _{k}X_{k,i}+U_{i}, \] where \(\mathrm{E}\left[U_{i}\right]=0.\)

  • The key assumption is that the errors and regressors are uncorrelated: \[ \mathrm{E}\left[X_{1,i}U_{i}\right] =\ldots =\mathrm{E}\left[X_{k,i}U_{i}\right] =0. \]

Omitted variables and OLS inconsistency

  • Suppose that the true model has two regressors: \[ \begin{aligned} &Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}, \\ &\mathrm{E}\left[X_{1,i}U_{i}\right] =\mathrm{E}\left[X_{2,i}U_{i}\right] =0. \end{aligned} \]

  • Suppose that the econometrician includes only \(X_{1}\) in the regression when estimating \(\beta _{1}\):

    \[ \begin{aligned} \tilde{\beta}_{1,n} &= \frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &= \frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) \left( \beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}\right) }{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &= \beta _{1}+\beta _{2}\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \\ &\quad +\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}}. \end{aligned} \]

  • Dividing numerator and denominator by \(n\) and applying the LLN as before:

    • The noise term vanishes: \[ \frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \rightarrow _{p} \frac{\mathrm{Cov}\left(X_{1,i},U_{i}\right)}{\mathrm{Var}\left(X_{1,i}\right)} = 0. \]

    • The bias term converges: \[ \frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) X_{2,i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1,n}\right) ^{2}} \rightarrow _{p} \frac{\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)}{\mathrm{Var}\left(X_{1,i}\right)}. \]

  • Therefore, \[ \tilde{\beta}_{1,n} \rightarrow _{p} \beta _{1}+\beta _{2}\frac{\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)}{\mathrm{Var}\left(X_{1,i}\right)}. \]

  • \(\tilde{\beta}_{1,n}\) is inconsistent unless:

    1. \(\beta _{2}=0\) (the model is correctly specified).

    2. \(\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)=0\) (the omitted variable is uncorrelated with the included regressor).

OVB through the composite error

  • In this example, the model contains two regressors: \[ \begin{aligned} &Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+U_{i}, \\ &\mathrm{E}\left[X_{1,i}U_{i}\right] =\mathrm{E}\left[X_{2,i}U_{i}\right] =0. \end{aligned} \]

  • However, since \(X_{2}\) is not controlled for, it goes into the error term: \[ \begin{aligned} Y_{i} &= \beta _{0}+\beta _{1}X_{1,i}+V_{i},\text{ where} \\ V_{i} &= \beta _{2}X_{2,i}+U_{i}. \end{aligned} \]

  • For consistency of \(\tilde{\beta}_{1,n}\) we need \(\mathrm{Cov}\left(X_{1,i},V_{i}\right) = 0\); however,

    \[ \begin{aligned} \mathrm{Cov}\left(X_{1,i},V_{i}\right) &= \mathrm{Cov}\left(X_{1,i},\beta _{2}X_{2,i}+U_{i}\right) \\ &= \mathrm{Cov}\left(X_{1,i},\beta _{2}X_{2,i}\right)+\mathrm{Cov}\left(X_{1,i},U_{i}\right) \\ &= \beta _{2}\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)+0 \\ &\neq 0\text{, unless }\beta _{2}=0\text{ or }\mathrm{Cov}\left(X_{1,i},X_{2,i}\right)=0. \end{aligned} \]

Part II: Asymptotic Normality

Why do we need asymptotic normality?

  • In the previous lectures, we showed that the OLS estimator has an exact normal distribution when the errors are normally distributed.

    • The same assumption is needed to show that the \(T\) statistic has a \(t\)-distribution and the \(F\) statistic has an \(F\)-distribution.
  • In this lecture, we argue that even when the errors are not normally distributed, the OLS estimator has an approximately normal distribution in large samples, provided that some additional conditions hold.

    • This property is used for hypothesis testing: in large samples, the \(T\) statistic has a standard normal distribution and the \(F\) statistic has a \(\chi^{2}\) distribution (approximately).

Asymptotic normality

  • Let \(W_{n}\) be a sequence of random variables indexed by the sample size \(n.\)

    • Typically, \(W_{n}\) will be a function of some estimator, such as \(W_{n}=\sqrt{n}\left( \hat{\beta}_{n}-\beta \right)\).
  • We say that \(W_{n}\) has an asymptotically normal distribution if its CDF converges to a normal CDF.

  • Let \(W\) be any random variable with a normal \(N\left( 0,\sigma^{2}\right)\) distribution and let \(F\) denote its CDF. We say that \(W_{n}\) has an asymptotically normal distribution if for all \(x\in \mathbb{R}\):

    \[ F_{n}\left( x\right) =P\left( W_{n}\leq x\right) \rightarrow P\left( W\leq x\right) =F\left( x\right) \text{ as }n\rightarrow \infty . \]

    • We denote this as \(W_{n}\rightarrow _{d}W\) or \(W_{n}\rightarrow _{d}N\left( 0,\sigma ^{2}\right).\)

Convergence in distribution

  • Asymptotic normality is an example of convergence in distribution.

  • We say that a sequence of random variables \(W_{n}\) converges in distribution to \(W\) (denoted as \(W_{n}\rightarrow _{d}W\)) if the CDF of \(W_{n}\) converges to the CDF of \(W\) at all points where the CDF of \(W\) is continuous.

  • Convergence in distribution is convergence of the CDFs.

Central Limit Theorem (CLT)

  • An example of convergence in distribution is a CLT.

  • Let \(X_{1},\ldots ,X_{n}\) be a sample of iid random variables such that \(\mathrm{E}\left[X_{i}\right] =0\) and \(\mathrm{Var}\left(X_{i}\right) =\sigma ^{2}>0\) (finite). Then, as \(n\rightarrow \infty,\)

    \[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i}\rightarrow _{d}N\left( 0,\sigma^{2}\right) . \]

  • \(\rightarrow_{d}\) means that the CDF of the scaled sum converges to the normal CDF: for every \(x\), \[ P\left(\frac{1}{\sigma\sqrt{n}}\sum_{i=1}^{n}X_{i} \leq x\right) \rightarrow \Phi(x) \text{ as } n \rightarrow \infty, \] where \(\Phi\) is the standard normal CDF. For large \(n\), the distribution of the scaled sum is approximately normal.

CLT with non-zero mean

  • For the CLT we impose 3 assumptions: (1) iid; (2) Mean zero; (3) Finite variance different from zero.

  • If \(X_{1},\ldots ,X_{n}\) are iid but \(\mathrm{E}\left[X_{i}\right] =\mu \neq 0,\) then consider \(X_{i}-\mu.\) Since \(\mathrm{E}\left[X_{i}-\mu\right] =0,\) we have

    \[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \rightarrow_{d}N\left( 0,\mathrm{Var}\left(X_{i}\right) \right) . \]

    Then

    \[ \begin{aligned} \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) &= \sqrt{n}\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\mu \right) \\ &= \sqrt{n}\left( \frac{1}{n}\sum_{i=1}^{n}X_{i}-\frac{1}{n}\sum_{i=1}^{n}\mu \right) \\ &= \sqrt{n}\left( \bar{X}_{n}-\mu \right) . \end{aligned} \]

CLT for the sample average

  • From the previous slide: \[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mu \right) = \sqrt{n}\left( \bar{X}_{n}-\mu \right) . \]

  • Thus, the CLT can be stated as

    \[ \sqrt{n}\left( \bar{X}_{n}-\mu \right) \rightarrow _{d}N\left( 0,\mathrm{Var}\left(X_{i}\right) \right) . \]

  • By the LLN,

    \[ \bar{X}_{n}-\mu \rightarrow _{p}0, \]

    and

    \[ \mathrm{Var}\left(\sqrt{n}\left( \bar{X}_{n}-\mu \right)\right) = n\mathrm{Var}\left(\bar{X}_{n}\right) = n\frac{\mathrm{Var}\left(X_{i}\right)}{n} = \mathrm{Var}\left(X_{i}\right). \]

Properties

  • Suppose that \(W_{n}\rightarrow _{d}N\left( 0,\sigma ^{2}\right)\) and \(\theta _{n}\rightarrow _{p}\theta.\) Then,

    \[ \theta _{n}W_{n}\rightarrow _{d}\theta N\left( 0,\sigma ^{2}\right) \stackrel{d}{=} N\left( 0,\theta ^{2}\sigma ^{2}\right) , \]

    and

    \[ \theta _{n}+W_{n}\rightarrow _{d}\theta +N\left( 0,\sigma ^{2}\right) \stackrel{d}{=} N\left( \theta ,\sigma ^{2}\right) . \]

  • Suppose that \(Z_{n}\rightarrow _{d}Z\sim N\left( 0,1\right).\) Then,

    \[ Z_{n}^{2}\rightarrow _{d}Z^{2}\equiv \chi _{1}^{2}. \]

  • If \(W_{n}\rightarrow _{d}c=\) constant, then \(W_{n}\rightarrow _{p}c.\)

Asymptotic normality of OLS

  • Suppose that:

    1. The data \(\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}\) are iid.
    2. \(Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\) where \(\mathrm{E}\left[U_{i}\right] =0.\)
    3. \(\mathrm{E}\left[X_{i}U_{i}\right] =0.\)
    4. \(0<\mathrm{Var}\left(X_{i}\right) <\infty.\)
    5. \(0<\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] <\infty\) and \(0<\mathrm{E}\left[U_{i}^{2}\right] <\infty.\)
  • Let \(\hat{\beta}_{1,n}\) be the OLS estimator of \(\beta _{1}.\) Then,

    \[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \rightarrow _{d}N\left( 0,\frac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}}\right). \]

  • \(V=\dfrac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}}\) is called the asymptotic variance of \(\hat{\beta}_{1,n}.\)

Large-sample approximation for OLS

  • Let \(\overset{a}{\sim}\) denote “approximately in large samples.”

  • The asymptotic normality

    \[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \rightarrow _{d}N\left(0,V\right) \]

    can be viewed as the following large-sample approximation:

    \[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) \overset{a}{\sim} N\left(0,V\right) , \]

    or

    \[ \hat{\beta}_{1,n}\overset{a}{\sim} N\left( \beta _{1},V/n\right) . \]

Proof: decomposition

Write

\[ \hat{\beta}_{1,n}=\beta _{1}+\frac{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \]

Now

\[ \hat{\beta}_{1,n}-\beta _{1}=\frac{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}, \]

and

\[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) =\frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \]

Proof: combining the limits

\[ \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) =\frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}}. \]

In Part I, we established

\[ \frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}\rightarrow_{p}\mathrm{Var}\left(X_{i}\right). \]

We will show that

\[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}\rightarrow _{d}N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right] \right), \]

so that

\[\begin{align*} \sqrt{n}\left( \hat{\beta}_{1,n}-\beta _{1}\right) &= \frac{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i}}{\frac{1}{n}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) ^{2}} \\ &\rightarrow _{d}\frac{N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] \right)}{\mathrm{Var}\left(X_{i}\right)} \\ &\stackrel{d}{=} N\left( 0,\frac{\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right) \right) ^{2}}\right). \end{align*}\]

Proof: numerator CLT

\[ \begin{aligned} \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\bar{X}_{n}\right) U_{i} &= \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]+\mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) U_{i} \\ &= \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}+\left( \mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) \frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}. \end{aligned} \]

We have

\[ \mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}\right] = \mathrm{E}\left[X_{i}U_{i}\right]-\mathrm{E}\left[X_{i}\right]\mathrm{E}\left[U_{i}\right]=0, \]

and \(0<\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right] <\infty,\) so by the CLT,

\[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) U_{i}\rightarrow_{d}N\left( 0,\mathrm{E}\left[\left( X_{i}-\mathrm{E}\left[X_{i}\right]\right) ^{2}U_{i}^{2}\right]\right). \]

Proof: second term vanishes

It is left to show that

\[ \left( \mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\right) \frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}\rightarrow _{p}0. \]

We have \(\mathrm{E}\left[U_{i}\right]=0\) and \(0<\mathrm{E}\left[U_{i}^{2}\right]<\infty.\) By the CLT,

\[ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}U_{i}\rightarrow _{d}N\left(0,\mathrm{E}\left[U_{i}^{2}\right]\right). \]

By the LLN,

\[ \mathrm{E}\left[X_{i}\right]-\bar{X}_{n}\rightarrow _{p}0. \]

Hence, the result follows.

Part III: Asymptotic Variance

Asymptotic variance

  • In Part II, we showed that when the data are iid and the regressors are exogenous, \[ \begin{aligned} Y_{i} &= \beta_{0} + \beta_{1}X_{i} + U_{i}, \\ \mathrm{E}\left[U_{i}\right] &= \mathrm{E}\left[X_{i}U_{i}\right] = 0, \end{aligned} \] the OLS estimator of \(\beta_{1}\) is asymptotically normal: \[ \begin{aligned} \sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) &\rightarrow_{d} N\left(0, V\right), \\ V &= \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}. \end{aligned} \]

  • For hypothesis testing, we need a consistent estimator of the asymptotic variance \(V\): \[ \hat{V}_{n} \rightarrow_{p} V. \]

Simplifying \(V\) under homoskedasticity

  • Assume that the errors are homoskedastic: \[ \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right] = \sigma^{2} \text{ for all } X_{i}\text{'s.} \]

  • In this case, the asymptotic variance can be simplified using the Law of Iterated Expectation: \[ \begin{aligned} \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right] &= \mathrm{E}\left[\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2} \mid X_{i}\right]\right] \\ &= \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2} \mathrm{E}\left[U_{i}^{2} \mid X_{i}\right]\right] \\ &= \mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2} \sigma^{2}\right] \\ &= \sigma^{2}\,\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}\right] = \sigma^{2}\mathrm{Var}\left(X_{i}\right). \end{aligned} \]

Estimating \(V\): method of moments

  • Thus, when the errors are homoskedastic with \(\mathrm{E}\left[U_{i}^{2}\right] = \sigma^{2},\) \[ V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} = \frac{\sigma^{2}\mathrm{Var}\left(X_{i}\right)}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}. \]

  • Let \(\hat{U}_{i} = Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i}\), where \(\hat{\beta}_{0,n}\) and \(\hat{\beta}_{1,n}\) are the OLS estimators of \(\beta_{0}\) and \(\beta_{1}.\)

  • A consistent estimator for the asymptotic variance can be constructed using the Method of Moments: \[ \begin{aligned} \hat{\sigma}_{n}^{2} &= \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2}, \\ \widehat{\mathrm{Var}}\left(X_{i}\right) &= \frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}, \text{ and} \\ \hat{V}_{n} &= \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}. \end{aligned} \]

Why the LLN does not apply directly

  • From the previous slide: \[ \begin{aligned} \hat{V}_{n} &= \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}, \\ \hat{\sigma}_{n}^{2} &= \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2}, \\ \hat{U}_{i} &= Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i}. \end{aligned} \]

  • When proving the consistency of OLS (Part I), we showed that \[ \frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2} \rightarrow_{p} \mathrm{Var}\left(X_{i}\right), \] and to establish \(\hat{V}_{n} \rightarrow_{p} V,\) we need to show that \(\hat{\sigma}_{n}^{2} \rightarrow_{p} \sigma^{2}.\)

  • The LLN cannot be applied directly to \[ \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2} \] because the \(\hat{U}_{i}\)’s are not iid: they are dependent through \(\hat{\beta}_{0,n}\) and \(\hat{\beta}_{1,n}.\)

Proof: \(\hat{\sigma}^{2}_{n} \rightarrow_{p} \sigma^{2}\)

  • First, write \[ \begin{aligned} \hat{U}_{i} &= Y_{i} - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i} \\ &= \left(\beta_{0} + \beta_{1}X_{i} + U_{i}\right) - \hat{\beta}_{0,n} - \hat{\beta}_{1,n}X_{i} \\ &= U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}. \end{aligned} \]

  • Now, \[ \hat{\sigma}_{n}^{2} = \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2} = \frac{1}{n}\sum_{i=1}^{n}\left(U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}\right)^{2}. \]

Completing the consistency proof

  • We have \[ \begin{aligned} \hat{\sigma}_{n}^{2} &= \frac{1}{n}\sum_{i=1}^{n}\left(U_{i} - \left(\hat{\beta}_{0,n} - \beta_{0}\right) - \left(\hat{\beta}_{1,n} - \beta_{1}\right)X_{i}\right)^{2} \\ &= \frac{1}{n}\sum_{i=1}^{n}U_{i}^{2} + \left(\hat{\beta}_{0,n} - \beta_{0}\right)^{2} + \left(\hat{\beta}_{1,n} - \beta_{1}\right)^{2}\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2} \\ &\quad -2\left(\hat{\beta}_{0,n} - \beta_{0}\right)\frac{1}{n}\sum_{i=1}^{n}U_{i} - 2\left(\hat{\beta}_{1,n} - \beta_{1}\right)\frac{1}{n}\sum_{i=1}^{n}U_{i}X_{i} \\ &\quad +2\left(\hat{\beta}_{0,n} - \beta_{0}\right)\left(\hat{\beta}_{1,n} - \beta_{1}\right)\frac{1}{n}\sum_{i=1}^{n}X_{i}. \end{aligned} \]

  • By the LLN, \[ \frac{1}{n}\sum_{i=1}^{n}U_{i}^{2} \rightarrow_{p} \mathrm{E}\left[U_{i}^{2}\right] = \sigma^{2}. \]

  • Because \(\hat{\beta}_{0,n}\) and \(\hat{\beta}_{1,n}\) are consistent, \[ \hat{\beta}_{0,n} - \beta_{0} \rightarrow_{p} 0 \text{ and } \hat{\beta}_{1,n} - \beta_{1} \rightarrow_{p} 0. \]

Using \(s^2\) instead of \(\hat{\sigma}^2_n\)

  • Thus, when the errors are homoskedastic, \[ \hat{V}_{n} = \frac{\hat{\sigma}_{n}^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}, \text{ with } \hat{\sigma}_{n}^{2} = \frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i}^{2}, \] is a consistent estimator of \(V = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}.\)

  • Similarly, \[ s^{2} = \frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2} \rightarrow_{p} \sigma^{2}, \] and therefore \[ \hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} \] is also a consistent estimator of \(V = \frac{\sigma^{2}}{\mathrm{Var}\left(X_{i}\right)}.\)

  • This version has an advantage over the one with \(\hat{\sigma}_{n}^{2}\): in addition to being consistent, \(s^{2}\) is also an unbiased estimator of \(\sigma^{2}\) if the regressors are strongly exogenous.

Asymptotic approximation

  • The result \(\sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) \rightarrow_{d} N\left(0, V\right)\) is used as the following approximation: \[ \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{V}{n}\right), \] where \(\overset{a}{\sim}\) denotes approximately in large samples. Thus, the variance of \(\hat{\beta}_{1,n}\) can be taken as approximately \(V/n.\)

  • With \(\hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}\) we have \[ \frac{\hat{V}_{n}}{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} \cdot \frac{1}{n} = \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}. \]

Connection to the exact result

  • From the previous slide: \[ \frac{\hat{V}_{n}}{n} = \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} \]

  • Thus, in the case of homoskedastic errors we have the following asymptotic approximation: \[ \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{s^{2}}{\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}\right). \]

  • In finite samples, we have the same result exactly, when the regressors are strongly exogenous and the errors are normal.

Asymptotic \(T\)-test

  • Consider testing \(H_{0}: \beta_{1} = \beta_{1,0}\) vs \(H_{1}: \beta_{1} \neq \beta_{1,0}.\)

  • Consider the behavior of the \(T\) statistic under \(H_{0}: \beta_{1} = \beta_{1,0}\). Since \[ \sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right) \rightarrow_{d} N\left(0, V\right) \text{ and } \hat{V}_{n} \rightarrow_{p} V, \] we have \[ \begin{aligned} T = \frac{\hat{\beta}_{1,n} - \beta_{1,0}}{\sqrt{\hat{V}_{n}/n}} &= \frac{\sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1,0}\right)}{\sqrt{\hat{V}_{n}}} \\ &\overset{H_{0}}{=} \frac{\sqrt{n}\left(\hat{\beta}_{1,n} - \beta_{1}\right)}{\sqrt{\hat{V}_{n}}} \\ &\rightarrow_{d} \frac{N\left(0, V\right)}{\sqrt{V}} \stackrel{d}{=} N\left(0, 1\right). \end{aligned} \]

Asymptotic \(T\)-test: rejection rule

  • Under \(H_{0}: \beta_{1} = \beta_{1,0},\) \[ T = \frac{\hat{\beta}_{1,n} - \beta_{1,0}}{\sqrt{\hat{V}_{n}/n}} \rightarrow_{d} N\left(0, 1\right), \] provided that \(\hat{V}_{n} \rightarrow_{p} V\) (the asymptotic variance of \(\hat{\beta}_{1,n}\)).

  • An asymptotic size \(\alpha\) test rejects \(H_{0}: \beta_{1} = \beta_{1,0}\) against \(H_{1}: \beta_{1} \neq \beta_{1,0}\) when \[ \left|T\right| > z_{1-\alpha/2}, \] where \(z_{1-\alpha/2}\) is a standard normal critical value.

  • Asymptotically, the variance of the OLS estimator is known; we behave as if the variance were known.

Heteroskedastic errors

  • In general, the errors are heteroskedastic: \(\mathrm{E}\left[U_{i}^{2} \mid X_{i}\right]\) is not constant and changes with \(X_{i}.\)

  • In this case, \(\hat{V}_{n} = \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}}\) is not a consistent estimator of the asymptotic variance \(V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}\): \[ \begin{aligned} \frac{s^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}} &\rightarrow_{p} \frac{\mathrm{E}\left[U_{i}^{2}\right]}{\mathrm{Var}\left(X_{i}\right)} \\ &= \frac{\mathrm{Var}\left(X_{i}\right)\cdot\mathrm{E}\left[U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}} \\ &\neq \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}. \end{aligned} \]

HC estimator of asymptotic variance

  • In the case of heteroskedastic errors, a consistent estimator of \(V = \frac{\mathrm{E}\left[\left(X_{i} - \mathrm{E}\left[X_{i}\right]\right)^{2}U_{i}^{2}\right]}{\left(\mathrm{Var}\left(X_{i}\right)\right)^{2}}\) can be constructed as follows: \[ \hat{V}_{n}^{HC} = \frac{\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}\hat{U}_{i}^{2}}{\left(\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \bar{X}_{n}\right)^{2}\right)^{2}}. \]

  • One can show that \(\hat{V}_{n}^{HC} \rightarrow_{p} V\) whether the errors are heteroskedastic or homoskedastic.

  • We have the following asymptotic approximation: \[ \hat{\beta}_{1,n} \overset{a}{\sim} N\left(\beta_{1}, \frac{\hat{V}_{n}^{HC}}{n}\right), \] and the standard errors can be computed as \(\mathrm{se}\left(\hat{\beta}_{1,n}\right) = \sqrt{\hat{V}_{n}^{HC}/n}.\)

HC standard errors in R

  • In R, the HC estimator of standard errors can be obtained using the sandwich package:

    library(wooldridge)
    library(lmtest)
    library(sandwich)
    data("wage1")
    reg <- lm(wage ~ educ + exper + tenure, data = wage1)
  • Standard (homoskedastic) standard errors:

    coeftest(reg)
    
    t test of coefficients:
    
                 Estimate Std. Error t value  Pr(>|t|)    
    (Intercept) -2.872735   0.728964 -3.9408 9.225e-05 ***
    educ         0.598965   0.051284 11.6795 < 2.2e-16 ***
    exper        0.022340   0.012057  1.8528   0.06447 .  
    tenure       0.169269   0.021645  7.8204 2.935e-14 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • HC (robust) standard errors:

    coeftest(reg, vcov = vcovHC(reg, type = "HC1"))
    
    t test of coefficients:
    
                 Estimate Std. Error t value  Pr(>|t|)    
    (Intercept) -2.872735   0.807415 -3.5579 0.0004078 ***
    educ         0.598965   0.061014  9.8169 < 2.2e-16 ***
    exper        0.022340   0.010555  2.1165 0.0347731 *  
    tenure       0.169269   0.029278  5.7814 1.277e-08 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1