Lecture 7: Confidence intervals

Economics 326 — Introduction to Econometrics II

Vadim Marmer, UBC

Point vs interval estimators

\(\newcommand{\fragment}[1]{\class{mjxfrag}{#1}}\)
  • Recall our model:

    1. \(Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\qquad i=1,\ldots ,n.\)
    2. \(\mathrm{E}\left[U_{i}\mid \mathbf{X}\right] =0\) for all \(i\)’s.
    3. \(\mathrm{E}\left[U_{i}^{2}\mid \mathbf{X}\right] =\sigma ^{2}\) for all \(i\)’s.
    4. \(\mathrm{E}\left[U_{i}U_{j}\mid \mathbf{X}\right] =0\) for all \(i\neq j\).
  • So far we have established that conditionally on \(\mathbf{X}\):

    • \(\mathrm{E}\left[\hat{\beta}_{1} \mid \mathbf{X}\right] = \beta_{1}\) (unbiasedness),
    • \(\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) = \dfrac{\sigma^{2}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}\).
  • If the \(U_i\)’s are continuously distributed, then with probability one, \(\hat{\beta}_{1}\neq \beta _{1}\): \[ \hat{\beta}_{1} = \beta_{1} + \frac{\sum_{i=1}^{n}(X_{i}-\bar{X})U_{i}}{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}. \]

  • An interval estimator is a random interval \([LB, UB]\) that contains the true parameter value with a pre-specified probability.

  • To construct an interval estimator for \(\beta_{1}\), we need to know the distribution of \(\hat{\beta}_{1}\).

  • This requires an additional assumption about the distribution of \(U_i\)’s. Let’s first review the normal distribution.

Normal distribution

  • A normal rv is a continuous rv that can take on any value. The PDF of a normal rv \(X\) is \[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \text{ where} \] \[ \mu = \mathrm{E}\left[X\right] \text{ and } \sigma^2 = \mathrm{Var}\left(X\right). \] We usually write \(X \sim N(\mu, \sigma^2)\).

  • If \(X \sim N(\mu, \sigma^2)\), then \(a + bX \sim N(a + b\mu, b^2\sigma^2)\).

Standard normal distribution

  • A standard normal rv has \(\mu = 0\) and \(\sigma^2 = 1\). Its PDF is \(\phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{z^2}{2}\right)\).

  • Symmetric around zero (mean): if \(Z \sim N(0, 1)\), \[\begin{align*} P(Z > 0) &= P(Z < 0)=0.5,\\ P(Z > z) &= P(Z < -z) \text{ for any } z. \end{align*}\]

  • Thin tails: \(P(-1.96 \leq Z \leq 1.96) = 0.95\).

  • If \(X \sim N(\mu, \sigma^2)\), then \((X - \mu)/\sigma \sim N(0, 1)\).

Bivariate normal distribution

  • \(X\) and \(Y\) have a bivariate normal distribution if their joint PDF is given by: \[ f(x, y) = \frac{1}{2\pi\sqrt{(1-\rho^2) \sigma_X^2 \sigma_Y^2}} \exp\left[-\frac{Q}{2(1-\rho^2)}\right], \] where \[\begin{align*} Q &= \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} \\ &\quad - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y}, \end{align*}\] \(\mu_X = \mathrm{E}\left[X\right]\), \(\mu_Y = \mathrm{E}\left[Y\right]\), \(\sigma_X^2 = \mathrm{Var}\left(X\right)\), \(\sigma_Y^2 = \mathrm{Var}\left(Y\right)\), and \(\rho = \mathrm{Corr}(X, Y)\).

Properties of bivariate normal

  • If \(X\) and \(Y\) have a bivariate normal distribution, then \(a + bX + cY \sim N(\mu^*, (\sigma^*)^2)\), where \[\begin{align*} \mu^* &= a + b\mu_X + c\mu_Y, \\ (\sigma^*)^2 &= b^2\sigma_X^2 + c^2\sigma_Y^2 + 2bc\rho\sigma_X\sigma_Y. \end{align*}\]

  • \(\mathrm{Cov}\left(X, Y\right) = 0 \Longrightarrow X\) and \(Y\) are independent.

  • This can be generalized to more than 2 variables (multivariate normal).

Normality of the OLS estimator

  • Assumption 5: \(U\)’s are jointly normally distributed conditional on \(\mathbf{X}\).

  • Then \(Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}\) are also jointly normally distributed conditional on \(\mathbf{X}\).

  • Since \(\hat{\beta}_{1}=\sum_{i=1}^{n}w_{i}Y_{i}\), where \(w_{i}=\frac{X_{i}-\bar{X}}{\sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}}\) depend only on \(\mathbf{X}\), \(\hat{\beta}_{1}\) is also normally distributed conditional on \(\mathbf{X}\).

  • Conditionally on \(\mathbf{X}\): \[\begin{align*} &\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) \right), \\ &\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{align*}\]

Interval estimation problem

  • We want to construct an interval estimator for \(\beta _{1}\):

    • The interval estimator is called a confidence interval (CI).

    • A CI contains the true value \(\beta _{1}\) with some pre-specified probability \(1-\alpha\), where \(\alpha\) is a small probability of error.

    • For example, if \(\alpha =0.05\), then the random CI will contain \(\beta _{1}\) with probability 0.95.

  • \(1-\alpha\) is called the coverage probability.

  • Confidence interval: \(CI_{1-\alpha }=[LB_{1-\alpha },UB_{1-\alpha }].\) The lower bound (LB) and upper bound (UB) should depend on the coverage probability \(1-\alpha.\)

  • The formal definition of CI: It is a random interval \(CI_{1-\alpha}\) such that conditionally on \(\mathbf{X}\), \[ P\left( \beta _{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha . \] Note that the random element is \(CI_{1-\alpha}\).

  • Sometimes, a CI is defined as \(P\left( \beta _{1}\in CI_{1-\alpha}\right) \geq 1-\alpha .\)

Symmetric CIs

  • One approach to constructing CIs is to consider a symmetric interval around the estimator \(\hat{\beta}_{1}\): \[ CI_{1-\alpha }=\left[ \hat{\beta}_{1}-c_{1-\alpha },\hat{\beta}_{1}+c_{1-\alpha }\right] \]

  • The problem is choosing \(c_{1-\alpha }\) such that \(P\left( \beta_{1}\in CI_{1-\alpha } \mid \mathbf{X}\right) =1-\alpha .\)

  • In choosing \(c_{1-\alpha }\), we will be relying on the fact that, given our assumptions and conditionally on \(\mathbf{X}\): \[\begin{align*} &\hat{\beta}_{1} \mid \mathbf{X} \sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right)\right), \\ &\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{align*}\]

  • Note that conditionally on \(\mathbf{X}\): \[ \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) }}\sim N\left( 0,1\right) . \]

Standard normal quantiles

  • Let \(Z\sim N\left( 0,1\right) .\) The \(\tau\)-th quantile (percentile) of the standard normal distribution is \(z_{\tau }\) such that \[ P\left( Z\leq z_{\tau }\right) =\tau . \]

  • Median: \(\tau =0.5\) and \(z_{0.5}=0.\) (\(P\left( Z\leq 0\right) =0.5\)).

  • If \(\tau =0.975\) then \(z_{0.975}=1.96\). Due to symmetry, if \(\tau =0.025\) then \(z_{0.025}=-1.96.\)

\(\sigma^2\) is known (infeasible CIs)

  • Suppose (for a moment) that \(\sigma ^{2}\) is known, and we can compute exactly the variance of \(\hat{\beta}_{1}\): \[ \mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \]

  • Consider the following CI: \[\begin{align*} CI_{1-\alpha } = \Big[ &\hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) }, \\ &\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) }\Big] . \end{align*}\]

  • For example, if \(1-\alpha =0.95 \Longleftrightarrow \alpha =0.05 \Longleftrightarrow z_{1-\alpha/2}=z_{0.975}=1.96\), and \(CI_{0.95}\) is \[\begin{align*} \hat{\beta}_{1} \pm 1.96\sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) } . \end{align*}\]

Infeasible CI validity (\(\sigma^2\) known)

  • Goal: show that \(P\left( \beta _{1}\in CI_{1-\alpha} \mid \mathbf{X}\right) =1-\alpha\).

  • Notation: \(\sigma_{\hat{\beta}_{1}} = \sqrt{\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right)}\).

  • Key fact: \(Z=\dfrac{\hat{\beta}_{1}-\beta _{1}}{\sigma_{\hat{\beta}_{1}}}\sim N(0,1)\) conditionally on \(\mathbf{X}\).

\[ \begin{aligned} &P\left(\beta _{1} \in CI_{1-\alpha} \mid \mathbf{X}\right) \\ &\fragment{{}= P\left(\hat{\beta}_{1} - z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \beta _{1} \leq \hat{\beta}_{1} + z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \beta _{1} - \hat{\beta}_{1} \leq z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \leq \hat{\beta}_{1} - \beta _{1} \leq z_{1-\alpha /2}\,\sigma_{\hat{\beta}_{1}} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2} \leq \frac{\hat{\beta}_{1} - \beta _{1}}{\sigma_{\hat{\beta}_{1}}} \leq z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(-z_{1-\alpha /2} \leq Z \leq z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= P\left(Z \leq z_{1-\alpha /2} \mid \mathbf{X}\right) - P\left(Z \leq -z_{1-\alpha /2} \mid \mathbf{X}\right)} \\ &\fragment{{}= \left(1-\alpha /2\right) - \alpha /2} \\ &\fragment{{}= 1-\alpha .} \end{aligned} \]

Feasible CIs (\(\sigma^2\) unknown)

  • Since \(\sigma ^{2}\) is unknown, we must estimate it from the data: \[\begin{align*} s^{2} &= \frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2} \\ &= \frac{1}{n-2}\sum_{i=1}^{n}\left( Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{i}\right) ^{2}. \end{align*}\]

  • The standard error of \(\hat{\beta}_{1}\) is defined as \[\begin{align*} \mathrm{se}\left(\hat{\beta}_{1}\right) &= \sqrt{\widehat{\mathrm{Var}}\left(\hat{\beta}_{1}\right)} \\ &= \sqrt{\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}}. \end{align*}\]

  • Replacing \(\sigma\) by its estimate does not give a normal distribution anymore: \[ \frac{\hat{\beta}_{1}-\beta _{1}}{\mathrm{se}\left(\hat{\beta}_{1}\right)}\mid \mathbf{X}\sim t_{n-2}. \] Here \(t_{n-2}\) denotes the \(t\)-distribution with \(n-2\) degrees of freedom.

  • The degrees of freedom depend on

    • the sample size (\(n\)),

    • and the number of parameters one has to estimate to compute \(s^{2}\) (two in this case, \(\beta _{0}\) and \(\beta _{1}\)).

  • Let \(t_{df,\tau }\) be the \(\tau\)-th quantile of the \(t\)-distribution with the number of degrees of freedom \(df\): If \(T\sim t_{df}\) then \[ P\left( T\leq t_{df,\tau }\right) =\tau . \]

  • Similarly to the normal distribution, the \(t\)-distribution is centered at zero and is symmetric around zero: \(t_{n-2,1-\alpha /2}=-t_{n-2,\alpha/2}.\)

  • We can now construct a feasible \(CI_{1-\alpha }\) as \[ \hat{\beta}_{1}\pm t_{n-2,1-\alpha /2} \times \mathrm{se}\left(\hat{\beta}_{1}\right). \]

Example

  • Data: rental from the wooldridge R package. 64 US cities in 1990.

    • rent: average monthly rent ($)
    • avginc: per capita income ($)
  • Model: Rent\(_{i}=\beta _{0}+\beta _{1}\)AvgInc\(_{i}+U_{i}.\)

  • R implementation:

    # Load data and run OLS regression
    library(wooldridge)
    data("rental")
    rental90 <- subset(rental, y90 == 1)
    reg <- lm(rent ~ avginc, data = rental90)
    summary(reg)
    
    Call:
    lm(formula = rent ~ avginc, data = rental90)
    
    Residuals:
       Min     1Q Median     3Q    Max 
    -94.67 -47.27 -13.68  25.65 228.46 
    
    Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
    (Intercept) 1.488e+02  3.210e+01   4.635 1.89e-05 ***
    avginc      1.158e-02  1.308e-03   8.851 1.34e-12 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 66.56 on 62 degrees of freedom
    Multiple R-squared:  0.5582,  Adjusted R-squared:  0.5511 
    F-statistic: 78.34 on 1 and 62 DF,  p-value: 1.341e-12
  • 95% CI for the slope coefficient:

    confint(reg, "avginc", level = 0.95)
                 2.5 %     97.5 %
    avginc 0.008964625 0.01419539
  • 90% CI for the slope coefficient:

    confint(reg, "avginc", level = 0.90)
                   5 %       95 %
    avginc 0.009395296 0.01376472

The effect of estimating \(\sigma^2\)

  • The \(t\)-distribution has heavier tails than the normal.

  • \(t_{df,1-\alpha /2}>z_{1-\alpha /2}\), but as \(df\) increases \(t_{df,1-\alpha /2}\rightarrow z_{1-\alpha /2}.\)

  • When the sample size \(n\) is large, \(t_{n-2,1-\alpha /2}\) can be replaced with \(z_{1-\alpha /2}.\)

  • In R, use qt() for \(t\)-quantiles and qnorm() for \(z\)-quantiles:

    # z critical value for 95% CI
    qnorm(0.975)
    [1] 1.959964
    # t critical values for 95% CI with different df
    qt(0.975, df = 30)
    [1] 2.042272
    qt(0.975, df = 100)
    [1] 1.983972
    qt(0.975, df = 1000)
    [1] 1.962339
    qt(0.975, df = 10000)
    [1] 1.960201

Interpretation of confidence intervals

  • The confidence interval \(CI_{1-\alpha }\) is a function of the sample \(\left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}\), and therefore is random. This allows us to talk about the probability of \(CI_{1-\alpha }\) containing the true value of \(\beta _{1}.\)

  • Once the confidence interval is computed given the data, we have its one realization. The realization of \(CI_{1-\alpha }\) (the computed confidence interval) is not random, and it does not make sense anymore to talk about the probability that it includes the true \(\beta _{1}.\)

  • Once the confidence interval is computed, it either contains the true value \(\beta _{1}\) or it does not.