Lecture 7: Confidence intervals

Economics 326 — Introduction to Econometrics II

Author

Vadim Marmer, UBC

Point vs interval estimators

Recall our model:
1. $Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\qquad i=1,\ldots ,n.$
2. $\mathrm{E}\left[U_{i}\mid \mathbf{X}\right] =0$ for all $i$’s.
3. $\mathrm{E}\left[U_{i}^{2}\mid \mathbf{X}\right] =\sigma ^{2}$ for all $i$’s.
4. $\mathrm{E}\left[U_{i}U_{j}\mid \mathbf{X}\right] =0$ for all $i\neq j$.
So far we have established that conditionally on $\mathbf{X}$:
- $\mathrm{E}\left[\hat{\beta}_{1} \mid \mathbf{X}\right] = \beta_{1}$ (unbiasedness),
- $\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) = \dfrac{\sigma^{2}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}$.
If the $U_i$’s are continuously distributed, then with probability one, $\hat{\beta}_{1}\neq \beta _{1}$: \[ \hat{\beta}_{1} = \beta_{1} + \frac{\sum_{i=1}^{n}(X_{i}-\bar{X})U_{i}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}. \]
An interval estimator is a random interval $[LB, UB]$ that contains the true parameter value with a pre-specified probability.
To construct an interval estimator for $\beta_{1}$, we need to know the distribution of $\hat{\beta}_{1}$.
This requires an additional assumption about the distribution of $U_i$’s. Let’s first review the normal distribution.

Normal distribution

A normal rv is a continuous rv that can take on any value. The PDF of a normal rv $X$ is \[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \text{ where} \] \[ \mu = \mathrm{E}\left[X\right] \text{ and } \sigma^2 = \mathrm{Var}\left(X\right). \] We usually write $X \sim N(\mu, \sigma^2)$.
If $X \sim N(\mu, \sigma^2)$, then $a + bX \sim N(a + b\mu, b^2\sigma^2)$.

Standard normal distribution

A standard normal rv has $\mu = 0$ and $\sigma^2 = 1$. Its PDF is $\phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{z^2}{2}\right)$.
Symmetric around zero (mean): if $Z \sim N(0, 1)$, \[\begin{align*} P(Z > 0) &= P(Z < 0)=0.5,\\ P(Z > z) &= P(Z < -z) \text{ for any } z. \end{align*}\]
Thin tails: $P(-1.96 \leq Z \leq 1.96) = 0.95$.
If $X \sim N(\mu, \sigma^2)$, then $(X - \mu)/\sigma \sim N(0, 1)$.

Bivariate normal distribution

$X$ and $Y$ have a bivariate normal distribution if their joint PDF is given by: \[ f(x, y) = \frac{1}{2\pi\sqrt{(1-\rho^2) \sigma_X^2 \sigma_Y^2}} \exp\left[-\frac{Q}{2(1-\rho^2)}\right], \] where \[\begin{align*} Q &= \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} \\ &\quad - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y}, \end{align*}\] $\mu_X = \mathrm{E}\left[X\right]$, $\mu_Y = \mathrm{E}\left[Y\right]$, $\sigma_X^2 = \mathrm{Var}\left(X\right)$, $\sigma_Y^2 = \mathrm{Var}\left(Y\right)$, and $\rho = \mathrm{Corr}(X, Y)$.

Properties of bivariate normal

If $X$ and $Y$ have a bivariate normal distribution, then $a + bX + cY \sim N(\mu^*, (\sigma^*)^2)$, where \[\begin{align*} \mu^* &= a + b\mu_X + c\mu_Y, \\ (\sigma^*)^2 &= b^2\sigma_X^2 + c^2\sigma_Y^2 + 2bc\rho\sigma_X\sigma_Y. \end{align*}\]
$\mathrm{Cov}\left(X, Y\right) = 0 \Longrightarrow X$ and $Y$ are independent.
This can be generalized to more than 2 variables (multivariate normal).

Normality of the OLS estimator

Assumption 5: $U$’s are jointly normally distributed conditional on $\mathbf{X}$.
Then $Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}$ are also jointly normally distributed conditional on $\mathbf{X}$.
Since $\hat{\beta}_{1}=\sum_{i=1}^{n}w_{i}Y_{i}$, where $w_{i}=\frac{X_{i}-\bar{X}}{\sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}}$ depend only on $\mathbf{X}$, $\hat{\beta}_{1}$ is also normally distributed conditional on $\mathbf{X}$.
Conditionally on $\mathbf{X}$: \[\begin{align*} &\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) \right), \\ &\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{align*}\]

Interval estimation problem

We want to construct an interval estimator for $\beta _{1}$:
- The interval estimator is called a confidence interval (CI).
- A CI contains the true value $\beta _{1}$ with some pre-specified probability $1-\alpha$, where $\alpha$ is a small probability of error.
- For example, if $\alpha =0.05$, then the random CI will contain $\beta _{1}$ with probability 0.95.
$1-\alpha$ is called the coverage probability.
Confidence interval: $CI_{1-\alpha }=[LB_{1-\alpha },UB_{1-\alpha }].$ The lower bound (LB) and upper bound (UB) should depend on the coverage probability $1-\alpha.$
The formal definition of CI: It is a random interval $CI_{1-\alpha}$ such that conditionally on $\mathbf{X}$, \[ P\left( \beta _{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha . \] Note that the random element is $CI_{1-\alpha}$.
Sometimes, a CI is defined as $P\left( \beta _{1}\in CI_{1-\alpha}\right) \geq 1-\alpha .$

Symmetric CIs

One approach to constructing CIs is to consider a symmetric interval around the estimator $\hat{\beta}_{1}$: \[ CI_{1-\alpha }=\left[ \hat{\beta}_{1}-c_{1-\alpha },\hat{\beta}_{1}+c_{1-\alpha }\right] \]
The problem is choosing $c_{1-\alpha }$ such that $P\left( \beta_{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha .$
In choosing $c_{1-\alpha }$, we will be relying on the fact that, given our assumptions and conditionally on $\mathbf{X}$: \[\begin{align*} &\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right)\right), \\ &\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{align*}\]
Note that conditionally on $\mathbf{X}$: \[ \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) }}\sim N\left( 0,1\right) . \]

Standard normal quantiles

Let $Z\sim N\left( 0,1\right) .$ The $\tau$-th quantile (percentile) of the standard normal distribution is $z_{\tau }$ such that \[ P\left( Z\leq z_{\tau }\right) =\tau . \]
Median: $\tau =0.5$ and $z_{0.5}=0.$ ($P\left( Z\leq 0\right) =0.5$).
If $\tau =0.975$ then $z_{0.975}=1.96$. Due to symmetry, if $\tau =0.025$ then $z_{0.025}=-1.96.$

$\sigma^2$ is known (infeasible CIs)

Suppose (for a moment) that $\sigma ^{2}$ is known, and we can compute exactly the variance of $\hat{\beta}_{1}$: \[ \mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \]
Consider the following CI: \[\begin{align*} CI_{1-\alpha } = \Big[ &\hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) }, \\ &\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) }\Big] . \end{align*}\]
For example, if $1-\alpha =0.95 \Longleftrightarrow \alpha =0.05 \Longleftrightarrow z_{1-\alpha/2}=z_{0.975}=1.96$, and $CI_{0.95}$ is \[\begin{align*} \hat{\beta}_{1} \pm 1.96\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) } . \end{align*}\]

Infeasible CI validity ($\sigma^2$ known)

Goal: show that $P\left( \beta _{1}\in CI_{1-\alpha} \mid \mathbf{X}\right) =1-\alpha$.
Notation: $\sigma_{\hat{\beta}_{1}} = \sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right)}$.
Key fact: $Z=\dfrac{\hat{\beta}_{1}-\beta _{1}}{\sigma_{\hat{\beta}_{1}}}\sim N(0,1)$ conditionally on $\mathbf{X}$.

\[ \begin{aligned} &P\left(\beta _{1} \in CI_{1-\alpha} \mid \mathbf{X}\right) \\ &\fragment{{}= P\left(\hat{\beta}_{1} - z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \beta _{1} \leq \hat{\beta}_{1} + z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \beta _{1} - \hat{\beta}_{1} \leq z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \hat{\beta}_{1} - \beta _{1} \leq z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2} \leq \frac{\hat{\beta}_{1} - \beta _{1}}{\sigma_{\hat{\beta}_{1}}} \leq z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2} \leq Z \leq z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(Z \leq z_{1-\alpha /2} \mid \mathbf{X}\right) - P\left(Z \leq -z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= \left(1-\alpha /2\right) - \alpha /2} \\ &\fragment{{}= 1-\alpha .} \end{aligned} \]

Feasible CIs ($\sigma^2$ unknown)

Since $\sigma ^{2}$ is unknown, we must estimate it from the data: \[\begin{align*} s^{2} &= \frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2} \\ &= \frac{1}{n-2}\sum_{i=1}^{n}\left( Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{i}\right) ^{2}. \end{align*}\]
The standard error of $\hat{\beta}_{1}$ is defined as \[\begin{align*} \mathrm{se}\left(\hat{\beta}_{1}\right) &= \sqrt{\widehat{\mathrm{Var}}\left(\hat{\beta}_{1}\right)} \\ &= \sqrt{\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}}. \end{align*}\]
Replacing $\sigma$ by its estimate does not give a normal distribution anymore: \[ \frac{\hat{\beta}_{1}-\beta _{1}}{\mathrm{se}\left(\hat{\beta}_{1}\right)}\mid \mathbf{X}\sim t_{n-2}. \] Here $t_{n-2}$ denotes the $t$-distribution with $n-2$ degrees of freedom.
The degrees of freedom depend on
- the sample size ($n$),
- and the number of parameters one has to estimate to compute $s^{2}$ (two in this case, $\beta _{0}$ and $\beta _{1}$).
Let $t_{df,\tau }$ be the $\tau$-th quantile of the $t$-distribution with the number of degrees of freedom $df$: If $T\sim t_{df}$ then \[ P\left( T\leq t_{df,\tau }\right) =\tau . \]
Similarly to the normal distribution, the $t$-distribution is centered at zero and is symmetric around zero: $t_{n-2,1-\alpha /2}=-t_{n-2,\alpha/2}.$
We can now construct a feasible $CI_{1-\alpha }$ as \[ \hat{\beta}_{1}\pm t_{n-2,1-\alpha /2} \times \mathrm{se}\left(\hat{\beta}_{1}\right). \]

Example

Data: rental from the wooldridge R package. 64 US cities in 1990.
- rent: average monthly rent ($)
- avginc: per capita income ($)
Model: Rent$_{i}=\beta _{0}+\beta _{1}$AvgInc$_{i}+U_{i}.$

R implementation:

# Load data and run OLS regression
library(wooldridge)
data("rental")
rental90 <- subset(rental, y90 == 1)
reg <- lm(rent ~ avginc, data = rental90)
summary(reg)


Call:
lm(formula = rent ~ avginc, data = rental90)

Residuals:
   Min     1Q Median     3Q    Max 
-94.67 -47.27 -13.68  25.65 228.46 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.488e+02  3.210e+01   4.635 1.89e-05 ***
avginc      1.158e-02  1.308e-03   8.851 1.34e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 66.56 on 62 degrees of freedom
Multiple R-squared:  0.5582,  Adjusted R-squared:  0.5511 
F-statistic: 78.34 on 1 and 62 DF,  p-value: 1.341e-12

95% CI for the slope coefficient:

confint(reg, "avginc", level = 0.95)

             2.5 %     97.5 %
avginc 0.008964625 0.01419539

90% CI for the slope coefficient:

confint(reg, "avginc", level = 0.90)

               5 %       95 %
avginc 0.009395296 0.01376472

The effect of estimating $\sigma^2$

The $t$-distribution has heavier tails than the normal.
$t_{df,1-\alpha /2}>z_{1-\alpha /2}$, but as $df$ increases $t_{df,1-\alpha /2}\rightarrow z_{1-\alpha /2}.$
When the sample size $n$ is large, $t_{n-2,1-\alpha /2}$ can be replaced with $z_{1-\alpha /2}.$

In R, use qt() for $t$-quantiles and qnorm() for $z$-quantiles:

# z critical value for 95% CI
qnorm(0.975)

[1] 1.959964

# t critical values for 95% CI with different df
qt(0.975, df = 30)

[1] 2.042272

qt(0.975, df = 100)

[1] 1.983972

qt(0.975, df = 1000)

[1] 1.962339

qt(0.975, df = 10000)

[1] 1.960201

Interpretation of confidence intervals

The confidence interval $CI_{1-\alpha }$ is a function of the sample $\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}$, and therefore is random. This allows us to talk about the probability of $CI_{1-\alpha }$ containing the true value of $\beta _{1}.$
Once the confidence interval is computed given the data, we have its one realization. The realization of $CI_{1-\alpha }$ (the computed confidence interval) is not random, and it does not make sense anymore to talk about the probability that it includes the true $\beta _{1}.$
Once the confidence interval is computed, it either contains the true value $\beta _{1}$ or it does not.

--- title: "Lecture 7: Confidence intervals" subtitle: "Economics 326 — Introduction to Econometrics II" author: - name: "Vadim Marmer, UBC" format: html: output-file: 326_07_cis.html toc: true toc-depth: 3 toc-location: right toc-title: "Table of Contents" theme: cosmo smooth-scroll: true html-math-method: mathjax pdf: output-file: 326_07_cis.pdf pdf-engine: xelatex geometry: margin=0.75in fontsize: 10pt number-sections: false toc: false classoption: fleqn include-in-header: text: | \newcommand{\fragment}[1]{#1} revealjs: output-file: 326_07_cis_slides.html theme: solarized css: slides_no_caps.css smaller: true slide-number: c/t incremental: true html-math-method: mathjax mathjax: 3 scrollable: true chalkboard: false self-contained: true transition: none include-after-body: reveal_mathjax_fragments.html --- ## Point vs interval estimators ::: {.hidden} $ \gdef\E#1{\mathrm{E}\left[#1\right]} \gdef\Var#1{\mathrm{Var}\left(#1\right)} \gdef\Vhat#1{\widehat{\mathrm{Var}}\left(#1\right)} \gdef\Cov#1{\mathrm{Cov}\left(#1\right)} \gdef\se#1{\mathrm{se}\left(#1\right)} $ ::: ```{=html} <span style="display:none">$\newcommand{\fragment}[1]{\class{mjxfrag}{#1}}$</span> ``` - Recall our model: 1. $Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\qquad i=1,\ldots ,n.$ 2. $\E{U_{i}\mid \mathbf{X}} =0$ for all $i$'s. 3. $\E{U_{i}^{2}\mid \mathbf{X}} =\sigma ^{2}$ for all $i$'s. 4. $\E{U_{i}U_{j}\mid \mathbf{X}} =0$ for all $i\neq j$. - So far we have established that conditionally on $\mathbf{X}$: - $\E{\hat{\beta}_{1} \mid \mathbf{X}} = \beta_{1}$ (unbiasedness), - $\Var{\hat{\beta}_{1} \mid \mathbf{X}} = \dfrac{\sigma^{2}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}$. - If the $U_i$'s are continuously distributed, then with probability **one**, $\hat{\beta}_{1}\neq \beta _{1}$: $$ \hat{\beta}_{1} = \beta_{1} + \frac{\sum_{i=1}^{n}(X_{i}-\bar{X})U_{i}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}. $$ - An **interval estimator** is a random interval $[LB, UB]$ that contains the true parameter value with a pre-specified probability. - To construct an interval estimator for $\beta_{1}$, we need to know the **distribution** of $\hat{\beta}_{1}$. - This requires an additional assumption about the distribution of $U_i$'s. Let's first review the normal distribution. ## Normal distribution - A normal rv is a continuous rv that can take on any value. The PDF of a normal rv $X$ is $$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \text{ where} $$ $$ \mu = \E{X} \text{ and } \sigma^2 = \Var{X}. $$ We usually write $X \sim N(\mu, \sigma^2)$. - If $X \sim N(\mu, \sigma^2)$, then $a + bX \sim N(a + b\mu, b^2\sigma^2)$. ## Standard normal distribution - A standard normal rv has $\mu = 0$ and $\sigma^2 = 1$. Its PDF is $\phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{z^2}{2}\right)$. - Symmetric around zero (mean): if $Z \sim N(0, 1)$, \begin{align*} P(Z > 0) &= P(Z < 0)=0.5,\\ P(Z > z) &= P(Z < -z) \text{ for any } z. \end{align*} - Thin tails: $P(-1.96 \leq Z \leq 1.96) = 0.95$. - If $X \sim N(\mu, \sigma^2)$, then $(X - \mu)/\sigma \sim N(0, 1)$. ## Bivariate normal distribution - $X$ and $Y$ have a bivariate normal distribution if their joint PDF is given by: $$ f(x, y) = \frac{1}{2\pi\sqrt{(1-\rho^2) \sigma_X^2 \sigma_Y^2}} \exp\left[-\frac{Q}{2(1-\rho^2)}\right], $$ where \begin{align*} Q &= \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} \\ &\quad - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y}, \end{align*} $\mu_X = \E{X}$, $\mu_Y = \E{Y}$, $\sigma_X^2 = \Var{X}$, $\sigma_Y^2 = \Var{Y}$, and $\rho = \mathrm{Corr}(X, Y)$. ## Properties of bivariate normal - If $X$ and $Y$ have a bivariate normal distribution, then $a + bX + cY \sim N(\mu^*, (\sigma^*)^2)$, where \begin{align*} \mu^* &= a + b\mu_X + c\mu_Y, \\ (\sigma^*)^2 &= b^2\sigma_X^2 + c^2\sigma_Y^2 + 2bc\rho\sigma_X\sigma_Y. \end{align*} - $\Cov{X, Y} = 0 \Longrightarrow X$ and $Y$ are independent. - This can be generalized to more than 2 variables (multivariate normal). ## Normality of the OLS estimator - **Assumption 5:** $U$'s are jointly normally distributed conditional on $\mathbf{X}$. - Then $Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}$ are also jointly normally distributed conditional on $\mathbf{X}$. - Since $\hat{\beta}_{1}=\sum_{i=1}^{n}w_{i}Y_{i}$, where $w_{i}=\frac{X_{i}-\bar{X}}{\sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}}$ depend only on $\mathbf{X}$, $\hat{\beta}_{1}$ is also normally distributed conditional on $\mathbf{X}$. - Conditionally on $\mathbf{X}$: \begin{align*} &\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\Var{\hat{\beta}_{1} \mid \mathbf{X}} \right), \\ &\Var{\hat{\beta}_{1} \mid \mathbf{X}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{align*} ## Interval estimation problem - We want to construct an **interval estimator** for $\beta _{1}$: - The interval estimator is called a **confidence interval** (CI). - A CI contains the **true** value $\beta _{1}$ **with some pre-specified probability** $1-\alpha$, where $\alpha$ is a small probability of error. - For example, if $\alpha =0.05$, then the random CI will contain $\beta _{1}$ with probability 0.95. - $1-\alpha$ is called the **coverage probability**. - Confidence interval: $CI_{1-\alpha }=[LB_{1-\alpha },UB_{1-\alpha }].$ The lower bound (LB) and upper bound (UB) should depend on the coverage probability $1-\alpha.$ - The formal definition of CI: It is a **random interval** $CI_{1-\alpha}$ such that conditionally on $\mathbf{X}$, $$ P\left( \beta _{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha . $$ Note that the random element is $CI_{1-\alpha}$. - Sometimes, a CI is defined as $P\left( \beta _{1}\in CI_{1-\alpha}\right) \geq 1-\alpha .$ ## Symmetric CIs - One approach to constructing CIs is to consider a **symmetric** interval around the estimator $\hat{\beta}_{1}$: $$ CI_{1-\alpha }=\left[ \hat{\beta}_{1}-c_{1-\alpha },\hat{\beta}_{1}+c_{1-\alpha }\right] $$ - The problem is choosing $c_{1-\alpha }$ such that $P\left( \beta_{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha .$ - In choosing $c_{1-\alpha }$, we will be relying on the fact that, given our assumptions and conditionally on $\mathbf{X}$: \begin{align*} &\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\Var{\hat{\beta}_{1} \mid \mathbf{X}}\right), \\ &\Var{\hat{\beta}_{1} \mid \mathbf{X}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{align*} - Note that conditionally on $\mathbf{X}$: $$ \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}} }}\sim N\left( 0,1\right) . $$ ## Standard normal quantiles - Let $Z\sim N\left( 0,1\right) .$ The $\tau$-th **quantile** (percentile) of the standard normal distribution is $z_{\tau }$ such that $$ P\left( Z\leq z_{\tau }\right) =\tau . $$ - **Median**: $\tau =0.5$ and $z_{0.5}=0.$ ($P\left( Z\leq 0\right) =0.5$). - If $\tau =0.975$ then $z_{0.975}=1.96$. Due to symmetry, if $\tau =0.025$ then $z_{0.025}=-1.96.$ ## $\sigma^2$ is known (infeasible CIs) - **Suppose** (for a moment) that $\sigma ^{2}$ is known, and we can compute exactly the variance of $\hat{\beta}_{1}$: $$ \Var{\hat{\beta}_{1} \mid \mathbf{X}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. $$ - Consider the following CI: \begin{align*} CI_{1-\alpha } = \Big[ &\hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}} }, \\ &\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}} }\Big] . \end{align*} - For example, if $1-\alpha =0.95 \Longleftrightarrow \alpha =0.05 \Longleftrightarrow z_{1-\alpha/2}=z_{0.975}=1.96$, and $CI_{0.95}$ is \begin{align*} \hat{\beta}_{1} \pm 1.96\sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}} } . \end{align*} ## Infeasible CI validity ($\sigma^2$ known) ::: {.nonincremental} - **Goal:** show that $P\left( \beta _{1}\in CI_{1-\alpha} \mid \mathbf{X}\right) =1-\alpha$. - **Notation:** $\sigma_{\hat{\beta}_{1}} = \sqrt{\Var{\hat{\beta}_{1} \mid \mathbf{X}}}$. - **Key fact:** $Z=\dfrac{\hat{\beta}_{1}-\beta _{1}}{\sigma_{\hat{\beta}_{1}}}\sim N(0,1)$ conditionally on $\mathbf{X}$. ::: $$ \begin{aligned} &P\left(\beta _{1} \in CI_{1-\alpha} \mid \mathbf{X}\right) \\ &\fragment{{}= P\left(\hat{\beta}_{1} - z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \beta _{1} \leq \hat{\beta}_{1} + z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \beta _{1} - \hat{\beta}_{1} \leq z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \hat{\beta}_{1} - \beta _{1} \leq z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2} \leq \frac{\hat{\beta}_{1} - \beta _{1}}{\sigma_{\hat{\beta}_{1}}} \leq z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2} \leq Z \leq z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(Z \leq z_{1-\alpha /2} \mid \mathbf{X}\right) - P\left(Z \leq -z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= \left(1-\alpha /2\right) - \alpha /2} \\ &\fragment{{}= 1-\alpha .} \end{aligned} $$ ## Feasible CIs ($\sigma^2$ unknown) - Since $\sigma ^{2}$ is unknown, we must estimate it from the data: \begin{align*} s^{2} &= \frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2} \\ &= \frac{1}{n-2}\sum_{i=1}^{n}\left( Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{i}\right) ^{2}. \end{align*} - The **standard error** of $\hat{\beta}_{1}$ is defined as \begin{align*} \se{\hat{\beta}_{1}} &= \sqrt{\Vhat{\hat{\beta}_{1}}} \\ &= \sqrt{\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}}. \end{align*} - Replacing $\sigma$ by its estimate does not give a normal distribution anymore: $$ \frac{\hat{\beta}_{1}-\beta _{1}}{\se{\hat{\beta}_{1}}}\mid \mathbf{X}\sim t_{n-2}. $$ Here $t_{n-2}$ denotes the $t$-distribution with $n-2$ degrees of freedom. - The degrees of freedom depend on - the sample size ($n$), - and the number of parameters one has to estimate to compute $s^{2}$ (two in this case, $\beta _{0}$ and $\beta _{1}$). - Let $t_{df,\tau }$ be the $\tau$-th quantile of the $t$-distribution with the number of degrees of freedom $df$: If $T\sim t_{df}$ then $$ P\left( T\leq t_{df,\tau }\right) =\tau . $$ - Similarly to the normal distribution, the $t$-distribution is centered at zero and is symmetric around zero: $t_{n-2,1-\alpha /2}=-t_{n-2,\alpha/2}.$ - We can now construct a feasible $CI_{1-\alpha }$ as $$ \hat{\beta}_{1}\pm t_{n-2,1-\alpha /2} \times \se{\hat{\beta}_{1}}. $$ ## Example - Data: `rental` from the `wooldridge` R package. 64 US cities in 1990. - `rent`: average monthly rent (\$) - `avginc`: per capita income (\$) - Model: Rent$_{i}=\beta _{0}+\beta _{1}$AvgInc$_{i}+U_{i}.$ - R implementation: ```{r} # Load data and run OLS regression library(wooldridge) data("rental") rental90 <- subset(rental, y90 == 1) reg <- lm(rent ~ avginc, data = rental90) summary(reg) ``` - 95% CI for the slope coefficient: ```{r} confint(reg, "avginc", level = 0.95) ``` - 90% CI for the slope coefficient: ```{r} confint(reg, "avginc", level = 0.90) ``` ## The effect of estimating $\sigma^2$ - The $t$-distribution has heavier tails than the normal. - $t_{df,1-\alpha /2}>z_{1-\alpha /2}$, but as $df$ increases $t_{df,1-\alpha /2}\rightarrow z_{1-\alpha /2}.$ - When the sample size $n$ is large, $t_{n-2,1-\alpha /2}$ can be replaced with $z_{1-\alpha /2}.$ - In R, use `qt()` for $t$-quantiles and `qnorm()` for $z$-quantiles: ```{r} # z critical value for 95% CI qnorm(0.975) # t critical values for 95% CI with different df qt(0.975, df = 30) qt(0.975, df = 100) qt(0.975, df = 1000) qt(0.975, df = 10000) ``` ## Interpretation of confidence intervals - The confidence interval $CI_{1-\alpha }$ is a function of the **sample** $\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}$, and therefore is **random**. This allows us to talk about the probability of $CI_{1-\alpha }$ containing the true value of $\beta _{1}.$ - Once the confidence interval is computed given the data, we have its **one realization**. The realization of $CI_{1-\alpha }$ (the computed confidence interval) is not random, and it does not make sense anymore to talk about the probability that it includes the true $\beta _{1}.$ - **Once the confidence interval is computed, it either contains the true value $\beta _{1}$ or it does not**.