If the \(U_i\)’s are continuously distributed, then with probability one, \(\hat{\beta}_{1}\neq \beta _{1}\): \[
\hat{\beta}_{1} = \beta_{1} + \frac{\sum_{i=1}^{n}(X_{i}-\bar{X})U_{i}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}.
\]
An interval estimator is a random interval \([LB, UB]\) that contains the true parameter value with a pre-specified probability.
To construct an interval estimator for \(\beta_{1}\), we need to know the distribution of \(\hat{\beta}_{1}\).
This requires an additional assumption about the distribution of \(U_i\)’s. Let’s first review the normal distribution.
Normal distribution
A normal rv is a continuous rv that can take on any value. The PDF of a normal rv \(X\) is \[
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \text{ where}
\]\[
\mu = \mathrm{E}\left[X\right] \text{ and } \sigma^2 = \mathrm{Var}\left(X\right).
\] We usually write \(X \sim N(\mu, \sigma^2)\).
If \(X \sim N(\mu, \sigma^2)\), then \(a + bX \sim N(a + b\mu, b^2\sigma^2)\).
Standard normal distribution
A standard normal rv has \(\mu = 0\) and \(\sigma^2 = 1\). Its PDF is \(\phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{z^2}{2}\right)\).
Symmetric around zero (mean): if \(Z \sim N(0, 1)\), \[\begin{align*}
P(Z > 0) &= P(Z < 0)=0.5,\\
P(Z > z) &= P(Z < -z) \text{ for any } z.
\end{align*}\]
Thin tails: \(P(-1.96 \leq Z \leq 1.96) = 0.95\).
If \(X \sim N(\mu, \sigma^2)\), then \((X - \mu)/\sigma \sim N(0, 1)\).
Bivariate normal distribution
\(X\) and \(Y\) have a bivariate normal distribution if their joint PDF is given by: \[
f(x, y) = \frac{1}{2\pi\sqrt{(1-\rho^2) \sigma_X^2 \sigma_Y^2}} \exp\left[-\frac{Q}{2(1-\rho^2)}\right],
\] where \[\begin{align*}
Q &= \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} \\
&\quad - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y},
\end{align*}\]\(\mu_X = \mathrm{E}\left[X\right]\), \(\mu_Y = \mathrm{E}\left[Y\right]\), \(\sigma_X^2 = \mathrm{Var}\left(X\right)\), \(\sigma_Y^2 = \mathrm{Var}\left(Y\right)\), and \(\rho = \mathrm{Corr}(X, Y)\).
Properties of bivariate normal
If \(X\) and \(Y\) have a bivariate normal distribution, then \(a + bX + cY \sim N(\mu^*, (\sigma^*)^2)\), where \[\begin{align*}
\mu^* &= a + b\mu_X + c\mu_Y, \\
(\sigma^*)^2 &= b^2\sigma_X^2 + c^2\sigma_Y^2 + 2bc\rho\sigma_X\sigma_Y.
\end{align*}\]
\(\mathrm{Cov}\left(X, Y\right) = 0 \Longrightarrow X\) and \(Y\) are independent.
This can be generalized to more than 2 variables (multivariate normal).
Normality of the OLS estimator
Assumption 5:\(U\)’s are jointly normally distributed conditional on \(\mathbf{X}\).
Then \(Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}\) are also jointly normally distributed conditional on \(\mathbf{X}\).
Since \(\hat{\beta}_{1}=\sum_{i=1}^{n}w_{i}Y_{i}\), where \(w_{i}=\frac{X_{i}-\bar{X}}{\sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}}\) depend only on \(\mathbf{X}\), \(\hat{\beta}_{1}\) is also normally distributed conditional on \(\mathbf{X}\).
We want to construct an interval estimator for \(\beta _{1}\):
The interval estimator is called a confidence interval (CI).
A CI contains the true value \(\beta _{1}\)with some pre-specified probability\(1-\alpha\), where \(\alpha\) is a small probability of error.
For example, if \(\alpha =0.05\), then the random CI will contain \(\beta _{1}\) with probability 0.95.
\(1-\alpha\) is called the coverage probability.
Confidence interval: \(CI_{1-\alpha }=[LB_{1-\alpha },UB_{1-\alpha }].\) The lower bound (LB) and upper bound (UB) should depend on the coverage probability \(1-\alpha.\)
The formal definition of CI: It is a random interval\(CI_{1-\alpha}\) such that conditionally on \(\mathbf{X}\), \[
P\left( \beta _{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha .
\] Note that the random element is \(CI_{1-\alpha}\).
Sometimes, a CI is defined as \(P\left( \beta _{1}\in CI_{1-\alpha}\right) \geq 1-\alpha .\)
Symmetric CIs
One approach to constructing CIs is to consider a symmetric interval around the estimator \(\hat{\beta}_{1}\): \[
CI_{1-\alpha }=\left[ \hat{\beta}_{1}-c_{1-\alpha },\hat{\beta}_{1}+c_{1-\alpha }\right]
\]
The problem is choosing \(c_{1-\alpha }\) such that \(P\left( \beta_{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha .\)
In choosing \(c_{1-\alpha }\), we will be relying on the fact that, given our assumptions and conditionally on \(\mathbf{X}\): \[\begin{align*}
&\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right)\right), \\
&\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}.
\end{align*}\]
Note that conditionally on \(\mathbf{X}\): \[
\frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) }}\sim N\left( 0,1\right) .
\]
Standard normal quantiles
Let \(Z\sim N\left( 0,1\right) .\) The \(\tau\)-th quantile (percentile) of the standard normal distribution is \(z_{\tau }\) such that \[
P\left( Z\leq z_{\tau }\right) =\tau .
\]
Median: \(\tau =0.5\) and \(z_{0.5}=0.\) (\(P\left( Z\leq 0\right) =0.5\)).
If \(\tau =0.975\) then \(z_{0.975}=1.96\). Due to symmetry, if \(\tau =0.025\) then \(z_{0.025}=-1.96.\)
\(\sigma^2\) is known (infeasible CIs)
Suppose (for a moment) that \(\sigma ^{2}\) is known, and we can compute exactly the variance of \(\hat{\beta}_{1}\): \[
\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}.
\]
For example, if \(1-\alpha =0.95 \Longleftrightarrow \alpha =0.05 \Longleftrightarrow z_{1-\alpha/2}=z_{0.975}=1.96\), and \(CI_{0.95}\) is \[\begin{align*}
\hat{\beta}_{1} \pm 1.96\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) } .
\end{align*}\]
Infeasible CI validity (\(\sigma^2\) known)
Goal: show that \(P\left( \beta _{1}\in CI_{1-\alpha} \mid \mathbf{X}\right) =1-\alpha\).
Since \(\sigma ^{2}\) is unknown, we must estimate it from the data: \[\begin{align*}
s^{2} &= \frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2} \\
&= \frac{1}{n-2}\sum_{i=1}^{n}\left( Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{i}\right) ^{2}.
\end{align*}\]
The standard error of \(\hat{\beta}_{1}\) is defined as \[\begin{align*}
\mathrm{se}\left(\hat{\beta}_{1}\right) &= \sqrt{\widehat{\mathrm{Var}}\left(\hat{\beta}_{1}\right)} \\
&= \sqrt{\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}}.
\end{align*}\]
Replacing \(\sigma\) by its estimate does not give a normal distribution anymore: \[
\frac{\hat{\beta}_{1}-\beta _{1}}{\mathrm{se}\left(\hat{\beta}_{1}\right)}\mid \mathbf{X}\sim t_{n-2}.
\] Here \(t_{n-2}\) denotes the \(t\)-distribution with \(n-2\) degrees of freedom.
The degrees of freedom depend on
the sample size (\(n\)),
and the number of parameters one has to estimate to compute \(s^{2}\) (two in this case, \(\beta _{0}\) and \(\beta _{1}\)).
Let \(t_{df,\tau }\) be the \(\tau\)-th quantile of the \(t\)-distribution with the number of degrees of freedom \(df\): If \(T\sim t_{df}\) then \[
P\left( T\leq t_{df,\tau }\right) =\tau .
\]
Similarly to the normal distribution, the \(t\)-distribution is centered at zero and is symmetric around zero: \(t_{n-2,1-\alpha /2}=-t_{n-2,\alpha/2}.\)
We can now construct a feasible \(CI_{1-\alpha }\) as \[
\hat{\beta}_{1}\pm t_{n-2,1-\alpha /2} \times \mathrm{se}\left(\hat{\beta}_{1}\right).
\]
Example
Data: rental from the wooldridge R package. 64 US cities in 1990.
# Load data and run OLS regressionlibrary(wooldridge)data("rental")rental90 <-subset(rental, y90 ==1)reg <-lm(rent ~ avginc, data = rental90)summary(reg)
Call:
lm(formula = rent ~ avginc, data = rental90)
Residuals:
Min 1Q Median 3Q Max
-94.67 -47.27 -13.68 25.65 228.46
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.488e+02 3.210e+01 4.635 1.89e-05 ***
avginc 1.158e-02 1.308e-03 8.851 1.34e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 66.56 on 62 degrees of freedom
Multiple R-squared: 0.5582, Adjusted R-squared: 0.5511
F-statistic: 78.34 on 1 and 62 DF, p-value: 1.341e-12
95% CI for the slope coefficient:
confint(reg, "avginc", level =0.95)
2.5 % 97.5 %
avginc 0.008964625 0.01419539
90% CI for the slope coefficient:
confint(reg, "avginc", level =0.90)
5 % 95 %
avginc 0.009395296 0.01376472
The effect of estimating \(\sigma^2\)
The \(t\)-distribution has heavier tails than the normal.
\(t_{df,1-\alpha /2}>z_{1-\alpha /2}\), but as \(df\) increases \(t_{df,1-\alpha /2}\rightarrow z_{1-\alpha /2}.\)
When the sample size \(n\) is large, \(t_{n-2,1-\alpha /2}\) can be replaced with \(z_{1-\alpha /2}.\)
In R, use qt() for \(t\)-quantiles and qnorm() for \(z\)-quantiles:
# z critical value for 95% CIqnorm(0.975)
[1] 1.959964
# t critical values for 95% CI with different dfqt(0.975, df =30)
[1] 2.042272
qt(0.975, df =100)
[1] 1.983972
qt(0.975, df =1000)
[1] 1.962339
qt(0.975, df =10000)
[1] 1.960201
Interpretation of confidence intervals
The confidence interval \(CI_{1-\alpha }\) is a function of the sample\(\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}\), and therefore is random. This allows us to talk about the probability of \(CI_{1-\alpha }\) containing the true value of \(\beta _{1}.\)
Once the confidence interval is computed given the data, we have its one realization. The realization of \(CI_{1-\alpha }\) (the computed confidence interval) is not random, and it does not make sense anymore to talk about the probability that it includes the true \(\beta _{1}.\)
Once the confidence interval is computed, it either contains the true value \(\beta _{1}\) or it does not.
Source Code
---title: "Lecture 7: Confidence intervals"subtitle: "Economics 326 — Introduction to Econometrics II"author: - name: "Vadim Marmer, UBC"format: html: output-file: 326_07_cis.html toc: true toc-depth: 3 toc-location: right toc-title: "Table of Contents" theme: cosmo smooth-scroll: true html-math-method: mathjax pdf: output-file: 326_07_cis.pdf pdf-engine: xelatex geometry: margin=0.75in fontsize: 10pt number-sections: false toc: false classoption: fleqn include-in-header: text: | \newcommand{\fragment}[1]{#1} revealjs: output-file: 326_07_cis_slides.html theme: solarized css: slides_no_caps.css smaller: true slide-number: c/t incremental: true html-math-method: mathjax mathjax: 3 scrollable: true chalkboard: false self-contained: true transition: none include-after-body: reveal_mathjax_fragments.html---## Point vs interval estimators::: {.hidden}$\gdef\E#1{\mathrm{E}\left[#1\right]}\gdef\Var#1{\mathrm{Var}\left(#1\right)}\gdef\Vhat#1{\widehat{\mathrm{Var}}\left(#1\right)}\gdef\Cov#1{\mathrm{Cov}\left(#1\right)}\gdef\se#1{\mathrm{se}\left(#1\right)}$:::```{=html}<span style="display:none">\(\newcommand{\fragment}[1]{\class{mjxfrag}{#1}}\)</span>```- Recall our model: 1. $Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\qquad i=1,\ldots ,n.$ 2. $\E{U_{i}\mid \mathbf{X}} =0$ for all $i$'s. 3. $\E{U_{i}^{2}\mid \mathbf{X}} =\sigma ^{2}$ for all $i$'s. 4. $\E{U_{i}U_{j}\mid \mathbf{X}} =0$ for all $i\neq j$.- So far we have established that conditionally on $\mathbf{X}$: - $\E{\hat{\beta}_{1} \mid \mathbf{X}} = \beta_{1}$ (unbiasedness), - $\Var{\hat{\beta}_{1} \mid \mathbf{X}} = \dfrac{\sigma^{2}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}$.- If the $U_i$'s are continuously distributed, then with probability **one**, $\hat{\beta}_{1}\neq \beta _{1}$: $$ \hat{\beta}_{1} = \beta_{1} + \frac{\sum_{i=1}^{n}(X_{i}-\bar{X})U_{i}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}. $$- An **interval estimator** is a random interval $[LB, UB]$ that contains the true parameter value with a pre-specified probability.- To construct an interval estimator for $\beta_{1}$, we need to know the **distribution** of $\hat{\beta}_{1}$.- This requires an additional assumption about the distribution of $U_i$'s. Let's first review the normal distribution.## Normal distribution- A normal rv is a continuous rv that can take on any value. The PDF of a normal rv $X$ is $$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \text{ where} $$ $$ \mu = \E{X} \text{ and } \sigma^2 = \Var{X}. $$ We usually write $X \sim N(\mu, \sigma^2)$.- If $X \sim N(\mu, \sigma^2)$, then $a + bX \sim N(a + b\mu, b^2\sigma^2)$.## Standard normal distribution- A standard normal rv has $\mu = 0$ and $\sigma^2 = 1$. Its PDF is $\phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{z^2}{2}\right)$.- Symmetric around zero (mean): if $Z \sim N(0, 1)$, \begin{align*} P(Z > 0) &= P(Z < 0)=0.5,\\ P(Z > z) &= P(Z < -z) \text{ for any } z. \end{align*}- Thin tails: $P(-1.96 \leq Z \leq 1.96) = 0.95$.- If $X \sim N(\mu, \sigma^2)$, then $(X - \mu)/\sigma \sim N(0, 1)$.## Bivariate normal distribution- $X$ and $Y$ have a bivariate normal distribution if their joint PDF is given by: $$ f(x, y) = \frac{1}{2\pi\sqrt{(1-\rho^2) \sigma_X^2 \sigma_Y^2}} \exp\left[-\frac{Q}{2(1-\rho^2)}\right], $$ where \begin{align*} Q &= \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} \\ &\quad - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y}, \end{align*} $\mu_X = \E{X}$, $\mu_Y = \E{Y}$, $\sigma_X^2 = \Var{X}$, $\sigma_Y^2 = \Var{Y}$, and $\rho = \mathrm{Corr}(X, Y)$.## Properties of bivariate normal- If $X$ and $Y$ have a bivariate normal distribution, then $a + bX + cY \sim N(\mu^*, (\sigma^*)^2)$, where \begin{align*} \mu^* &= a + b\mu_X + c\mu_Y, \\ (\sigma^*)^2 &= b^2\sigma_X^2 + c^2\sigma_Y^2 + 2bc\rho\sigma_X\sigma_Y. \end{align*}- $\Cov{X, Y} = 0 \Longrightarrow X$ and $Y$ are independent.- This can be generalized to more than 2 variables (multivariate normal).## Normality of the OLS estimator- **Assumption 5:** $U$'s are jointly normally distributed conditional on $\mathbf{X}$.- Then $Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}$ are also jointly normally distributed conditional on $\mathbf{X}$.- Since $\hat{\beta}_{1}=\sum_{i=1}^{n}w_{i}Y_{i}$, where $w_{i}=\frac{X_{i}-\bar{X}}{\sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}}$ depend only on $\mathbf{X}$, $\hat{\beta}_{1}$ is also normally distributed conditional on $\mathbf{X}$.- Conditionally on $\mathbf{X}$: \begin{align*} &\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\Var{\hat{\beta}_{1} \mid \mathbf{X}} \right), \\ &\Var{\hat{\beta}_{1} \mid \mathbf{X}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{align*}## Interval estimation problem- We want to construct an **interval estimator** for $\beta _{1}$: - The interval estimator is called a **confidence interval** (CI). - A CI contains the **true** value $\beta _{1}$ **with some pre-specified probability** $1-\alpha$, where $\alpha$ is a small probability of error. - For example, if $\alpha =0.05$, then the random CI will contain $\beta _{1}$ with probability 0.95.- $1-\alpha$ is called the **coverage probability**.- Confidence interval: $CI_{1-\alpha }=[LB_{1-\alpha },UB_{1-\alpha }].$ The lower bound (LB) and upper bound (UB) should depend on the coverage probability $1-\alpha.$- The formal definition of CI: It is a **random interval** $CI_{1-\alpha}$ such that conditionally on $\mathbf{X}$, $$ P\left( \beta _{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha . $$ Note that the random element is $CI_{1-\alpha}$.- Sometimes, a CI is defined as $P\left( \beta _{1}\in CI_{1-\alpha}\right) \geq 1-\alpha .$## Symmetric CIs- One approach to constructing CIs is to consider a **symmetric** interval around the estimator $\hat{\beta}_{1}$: $$ CI_{1-\alpha }=\left[ \hat{\beta}_{1}-c_{1-\alpha },\hat{\beta}_{1}+c_{1-\alpha }\right] $$- The problem is choosing $c_{1-\alpha }$ such that $P\left( \beta_{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha .$- In choosing $c_{1-\alpha }$, we will be relying on the fact that, given our assumptions and conditionally on $\mathbf{X}$: \begin{align*} &\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\Var{\hat{\beta}_{1} \mid \mathbf{X}}\right), \\ &\Var{\hat{\beta}_{1} \mid \mathbf{X}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{align*}- Note that conditionally on $\mathbf{X}$: $$ \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}} }}\sim N\left( 0,1\right) . $$## Standard normal quantiles- Let $Z\sim N\left( 0,1\right) .$ The $\tau$-th **quantile** (percentile) of the standard normal distribution is $z_{\tau }$ such that $$ P\left( Z\leq z_{\tau }\right) =\tau . $$- **Median**: $\tau =0.5$ and $z_{0.5}=0.$ ($P\left( Z\leq 0\right) =0.5$).- If $\tau =0.975$ then $z_{0.975}=1.96$. Due to symmetry, if $\tau =0.025$ then $z_{0.025}=-1.96.$## $\sigma^2$ is known (infeasible CIs)- **Suppose** (for a moment) that $\sigma ^{2}$ is known, and we can compute exactly the variance of $\hat{\beta}_{1}$: $$ \Var{\hat{\beta}_{1} \mid \mathbf{X}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. $$- Consider the following CI: \begin{align*} CI_{1-\alpha } = \Big[ &\hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}} }, \\ &\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}} }\Big] . \end{align*}- For example, if $1-\alpha =0.95 \Longleftrightarrow \alpha =0.05 \Longleftrightarrow z_{1-\alpha/2}=z_{0.975}=1.96$, and $CI_{0.95}$ is \begin{align*} \hat{\beta}_{1} \pm 1.96\sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}} } . \end{align*}## Infeasible CI validity ($\sigma^2$ known)::: {.nonincremental}- **Goal:** show that $P\left( \beta _{1}\in CI_{1-\alpha} \mid \mathbf{X}\right) =1-\alpha$.- **Notation:** $\sigma_{\hat{\beta}_{1}} = \sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}}}$.- **Key fact:** $Z=\dfrac{\hat{\beta}_{1}-\beta _{1}}{\sigma_{\hat{\beta}_{1}}}\sim N(0,1)$ conditionally on $\mathbf{X}$.:::$$\begin{aligned}&P\left(\beta _{1} \in CI_{1-\alpha} \mid \mathbf{X}\right) \\&\fragment{{}= P\left(\hat{\beta}_{1} - z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \beta _{1} \leq \hat{\beta}_{1} + z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\&\fragment{{}= P\left(-z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \beta _{1} - \hat{\beta}_{1} \leq z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\&\fragment{{}= P\left(-z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \hat{\beta}_{1} - \beta _{1} \leq z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\&\fragment{{}= P\left(-z_{1-\alpha /2} \leq \frac{\hat{\beta}_{1} - \beta _{1}}{\sigma_{\hat{\beta}_{1}}} \leq z_{1-\alpha /2} \mid \mathbf{X}\right)} \\&\fragment{{}= P\left(-z_{1-\alpha /2} \leq Z \leq z_{1-\alpha /2} \mid \mathbf{X}\right)} \\&\fragment{{}= P\left(Z \leq z_{1-\alpha /2} \mid \mathbf{X}\right) - P\left(Z \leq -z_{1-\alpha /2} \mid \mathbf{X}\right)} \\&\fragment{{}= \left(1-\alpha /2\right) - \alpha /2} \\&\fragment{{}= 1-\alpha .}\end{aligned}$$## Feasible CIs ($\sigma^2$ unknown)- Since $\sigma ^{2}$ is unknown, we must estimate it from the data: \begin{align*} s^{2} &= \frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2} \\ &= \frac{1}{n-2}\sum_{i=1}^{n}\left( Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{i}\right) ^{2}. \end{align*}- The **standard error** of $\hat{\beta}_{1}$ is defined as \begin{align*} \se{\hat{\beta}_{1}} &= \sqrt{\Vhat{\hat{\beta}_{1}}} \\ &= \sqrt{\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}}. \end{align*}- Replacing $\sigma$ by its estimate does not give a normal distribution anymore: $$ \frac{\hat{\beta}_{1}-\beta _{1}}{\se{\hat{\beta}_{1}}}\mid \mathbf{X}\sim t_{n-2}. $$ Here $t_{n-2}$ denotes the $t$-distribution with $n-2$ degrees of freedom.- The degrees of freedom depend on - the sample size ($n$), - and the number of parameters one has to estimate to compute $s^{2}$ (two in this case, $\beta _{0}$ and $\beta _{1}$).- Let $t_{df,\tau }$ be the $\tau$-th quantile of the $t$-distribution with the number of degrees of freedom $df$: If $T\sim t_{df}$ then $$ P\left( T\leq t_{df,\tau }\right) =\tau . $$- Similarly to the normal distribution, the $t$-distribution is centered at zero and is symmetric around zero: $t_{n-2,1-\alpha /2}=-t_{n-2,\alpha/2}.$- We can now construct a feasible $CI_{1-\alpha }$ as $$ \hat{\beta}_{1}\pm t_{n-2,1-\alpha /2} \times \se{\hat{\beta}_{1}}. $$## Example- Data: `rental` from the `wooldridge` R package. 64 US cities in 1990. - `rent`: average monthly rent (\$) - `avginc`: per capita income (\$)- Model: Rent$_{i}=\beta _{0}+\beta _{1}$AvgInc$_{i}+U_{i}.$- R implementation:```{r}# Load data and run OLS regressionlibrary(wooldridge)data("rental") rental90 <-subset(rental, y90 ==1) reg <-lm(rent ~ avginc, data = rental90)summary(reg)```- 95% CI for the slope coefficient:```{r}confint(reg, "avginc", level =0.95)```- 90% CI for the slope coefficient:```{r}confint(reg, "avginc", level =0.90)```## The effect of estimating $\sigma^2$- The $t$-distribution has heavier tails than the normal.- $t_{df,1-\alpha /2}>z_{1-\alpha /2}$, but as $df$ increases $t_{df,1-\alpha /2}\rightarrow z_{1-\alpha /2}.$- When the sample size $n$ is large, $t_{n-2,1-\alpha /2}$ can be replaced with $z_{1-\alpha /2}.$- In R, use `qt()` for $t$-quantiles and `qnorm()` for $z$-quantiles:```{r}# z critical value for 95% CIqnorm(0.975)# t critical values for 95% CI with different dfqt(0.975, df =30)qt(0.975, df =100)qt(0.975, df =1000)qt(0.975, df =10000)```## Interpretation of confidence intervals- The confidence interval $CI_{1-\alpha }$ is a function of the **sample** $\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}$, and therefore is **random**. This allows us to talk about the probability of $CI_{1-\alpha }$ containing the true value of $\beta _{1}.$- Once the confidence interval is computed given the data, we have its **one realization**. The realization of $CI_{1-\alpha }$ (the computed confidence interval) is not random, and it does not make sense anymore to talk about the probability that it includes the true $\beta _{1}.$- **Once the confidence interval is computed, it either contains the true value $\beta _{1}$ or it does not**.