Lecture 7: Confidence intervals

Economics 326 — Methods of Empirical Research in Economics

Vadim Marmer, UBC

Point estimation

We use the notation \mathrm{E}\left[\cdot \mid \mathbf{X}\right] = \mathrm{E}\left[\cdot \mid X_1, \ldots, X_n\right].

  • Our model:

    1. Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\qquad i=1,\ldots ,n.

    2. \mathrm{E}\left[U_{i}\mid \mathbf{X}\right] =0 for all i’s.

    3. \mathrm{E}\left[U_{i}^{2}\mid \mathbf{X}\right] =\sigma ^{2} for all i’s.

    4. \mathrm{E}\left[U_{i}U_{j}\mid \mathbf{X}\right] =0 for all i\neq j.

    5. U’s are jointly normally distributed conditional on \mathbf{X}.

  • The OLS estimator \hat{\beta}_{1} is a point estimator of \beta _{1}.

  • For our model, conditional on \mathbf{X}: \begin{aligned} \hat{\beta}_{1} &\sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1}\right) \right), \\ \mathrm{Var}\left(\hat{\beta}_{1}\right) &=\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{aligned}

  • With probability one, we have that \hat{\beta}_{1}\neq \beta _{1}.

Interval estimation problem

  • We want to construct an interval estimator for \beta _{1}:

    • The interval estimator is called a confidence interval (CI).

    • A CI contains the true value \beta _{1} with some pre-specified probability 1-\alpha, where \alpha is a small probability of error.

    • For example, if \alpha =0.05, then the random CI will contain \beta _{1} with probability 0.95.

  • 1-\alpha is called the coverage probability.

  • Confidence interval: CI_{1-\alpha }=[LB_{1-\alpha },UB_{1-\alpha }]. The lower bound (LB) and upper bound (UB) should depend on the coverage probability 1-\alpha.

  • The formal definition of CI: It is a random interval CI_{1-\alpha} such that conditional on X’s, P\left( \beta _{1}\in CI_{1-\alpha }\right) =1-\alpha . Note that the random element is CI_{1-\alpha}.

  • Sometimes, a CI is defined as P\left( \beta _{1}\in CI_{1-\alpha}\right) \geq 1-\alpha .

Symmetric CIs

  • One approach to constructing CIs is to consider a symmetric interval around the estimator \hat{\beta}_{1}: CI_{1-\alpha }=\left[ \hat{\beta}_{1}-c_{1-\alpha },\hat{\beta}_{1}+c_{1-\alpha }\right]

  • The problem is choosing c_{1-\alpha } such that P\left( \beta_{1}\in CI_{1-\alpha }\right) =1-\alpha .

  • In choosing c_{1-\alpha } we will be relying on the fact that given our assumptions and conditionally on \mathbf{X}: \hat{\beta}_{1}\sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1}\right)\right) \text{ and }\mathrm{Var}\left(\hat{\beta}_{1}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}.

  • Note that conditionally on \mathbf{X}: \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\sim N\left( 0,1\right) .

Quantiles (percentiles) of the standard normal distribution

  • Let Z\sim N\left( 0,1\right) . The \tau-th quantile (percentile) of the standard normal distribution is z_{\tau } such that P\left( Z\leq z_{\tau }\right) =\tau .

  • Median: \tau =0.5 and z_{0.5}=0. (P\left( Z\leq 0\right) =0.5).

  • If \tau =0.975 then z_{0.975}=1.96. Due to symmetry, if \tau =0.025 then z_{0.025}=-1.96.

\sigma^2 is known (infeasible CIs)

  • Suppose (for a moment) that \sigma ^{2} is known, and we can compute exactly the variance of \hat{\beta}_{1} as \mathrm{Var}\left(\hat{\beta}_{1}\right) =\sigma ^{2}/\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}.

  • Consider the following CI: CI_{1-\alpha }=\left[ \hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) },\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\right] .

  • For example, if 1-\alpha =0.95 \Longleftrightarrow \alpha =0.05\Longleftrightarrow z_{1-\alpha/2}=z_{0.975}=1.96, and CI_{0.95}=\left[ \hat{\beta}_{1}-1.96\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) },\hat{\beta}_{1}+1.96\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\right] .

Validity of the infeasible CIs (\sigma^2 is known)

  • We need to show that P\left( \beta _{1}\in \left[ \hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) },\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\right] \right)=1-\alpha .

  • Next, \begin{aligned} &\hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\leq \beta _{1}\leq \hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) } \\ &\Longleftrightarrow -z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\leq \beta _{1}-\hat{\beta}_{1}\leq z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) } \\ &\Longleftrightarrow -z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\leq \hat{\beta}_{1}-\beta _{1}\leq z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) } \\ &\Longleftrightarrow -z_{1-\alpha /2}\leq \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\leq z_{1-\alpha /2} \end{aligned}

Validity of the infeasible CIs (\sigma^2 is known)

  • We have that \begin{aligned} &\beta _{1}\in \left[ \hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) },\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\right] \\ &\Longleftrightarrow -z_{1-\alpha /2}\leq \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\leq z_{1-\alpha /2}. \end{aligned}

  • Let Z=\frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\sim N\left( 0,1\right). \begin{aligned} &P\left( -z_{1-\alpha /2}\leq \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\leq z_{1-\alpha /2}\right) \\ &=P\left( -z_{1-\alpha /2}\leq Z\leq z_{1-\alpha /2}\right) \\ &=P\left( z_{\alpha /2}\leq Z\leq z_{1-\alpha /2}\right) \\ &=1-\alpha /2-\alpha /2=1-\alpha . \end{aligned}

Feasible confidence intervals (\sigma^2 is unknown)

  • Since \sigma ^{2} is unknown, we must estimate it from the data: s^{2}=\frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2}=\frac{1}{n-2}\sum_{i=1}^{n}\left( Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{i}\right) ^{2}.

  • We can replace \sigma ^{2} by s^{2}, however, the result does not have a normal distribution any more: \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) }}\sim t_{n-2},\text{ where }\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) =\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. Here t_{n-2} denotes the t-distribution with n-2 degrees of freedom.

  • The degrees of freedom depend on

    • the sample size (n),

    • and the number of parameters one have to estimate to compute s^{2} (two in this case, \beta _{0} and \beta _{1}).

Feasible confidence intervals (\sigma^2 is unknown)

  • Let t_{df,\tau } be the \tau-th quantile of the t-distribution with the number of degrees of freedom df: If T\sim t_{df} then P\left( T\leq t_{df,\tau }\right) =\tau .

  • Similarly to the normal distribution, the t-distribution is centered at zero and is symmetric around zero: t_{n-2,1-\alpha /2}=-t_{n-2,\alpha/2}.

  • We can now construct a feasible confidence interval with 1-\alpha coverage as: \begin{aligned} CI_{1-\alpha } &=\left[ \hat{\beta}_{1}-t_{n-2,1-\alpha /2}\sqrt{\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) },\hat{\beta}_{1}+t_{n-2,1-\alpha /2}\sqrt{\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) }\right], \\ &\text{where }\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) =\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{aligned}

Example: Rent rates and average income

  • Data (RENTAL.DTA): 64 cities in 1990, Rent = average rent, AvgInc = per capita income: Rent_{i}=\beta _{0}+\beta _{1}AvgInc_{i}+U_{i}.

  • t_{62,0.975}\approx 2.00 \Longrightarrow The 95% confidence interval for \beta _{1} is \left[ 0.0115-2\times 0.0013,0.0115+2\times 0.0013\right] = [0.0089,0.0141].

  • t_{62,0.95}\approx 1.671 \Longrightarrow The 90% confidence interval for \beta _{1} is \left[ 0.0115-1.671\times 0.0013,0.0115+1.671\times 0.0013\right] = [0.0093,0.0137].

The effect of estimation of \sigma^2

  • The t-distribution has heavier tails than normal.

  • t_{df,1-\alpha /2}>z_{1-\alpha /2}, but as df increases t_{df,1-\alpha /2}\rightarrow z_{1-\alpha /2}.

  • When the sample size n is large, t_{n-2,1-\alpha /2} can be replaced with z_{1-\alpha /2}.

Interpretation of confidence intervals

  • The confidence interval CI_{1-\alpha } is a function of the sample \left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}, and therefore is random. This allows us to talk about probability of CI_{1-\alpha } containing the true value of \beta _{1}.

  • Once the confidence interval is computed given the data, we have its one realization. The realization of CI_{1-\alpha } or (computed confidence interval) is not random, and it does not make sense anymore to talk about the probability that it includes the true \beta _{1}.

  • Once the confidence interval is computed, it either contains the true value \beta _{1} or it does not.