Economics 326 — Methods of Empirical Research in Economics
$
We use the notation \mathrm{E}\left[\cdot \mid \mathbf{X}\right] = \mathrm{E}\left[\cdot \mid X_1, \ldots, X_n\right].
Our model:
Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\qquad i=1,\ldots ,n.
\mathrm{E}\left[U_{i}\mid \mathbf{X}\right] =0 for all i’s.
\mathrm{E}\left[U_{i}^{2}\mid \mathbf{X}\right] =\sigma ^{2} for all i’s.
\mathrm{E}\left[U_{i}U_{j}\mid \mathbf{X}\right] =0 for all i\neq j.
U’s are jointly normally distributed conditional on \mathbf{X}.
The OLS estimator \hat{\beta}_{1} is a point estimator of \beta _{1}.
For our model, conditional on \mathbf{X}: \begin{aligned} \hat{\beta}_{1} &\sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1}\right) \right), \\ \mathrm{Var}\left(\hat{\beta}_{1}\right) &=\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{aligned}
With probability one, we have that \hat{\beta}_{1}\neq \beta _{1}.
We want to construct an interval estimator for \beta _{1}:
The interval estimator is called a confidence interval (CI).
A CI contains the true value \beta _{1} with some pre-specified probability 1-\alpha, where \alpha is a small probability of error.
For example, if \alpha =0.05, then the random CI will contain \beta _{1} with probability 0.95.
1-\alpha is called the coverage probability.
Confidence interval: CI_{1-\alpha }=[LB_{1-\alpha },UB_{1-\alpha }]. The lower bound (LB) and upper bound (UB) should depend on the coverage probability 1-\alpha.
The formal definition of CI: It is a random interval CI_{1-\alpha} such that conditional on X’s, P\left( \beta _{1}\in CI_{1-\alpha }\right) =1-\alpha . Note that the random element is CI_{1-\alpha}.
Sometimes, a CI is defined as P\left( \beta _{1}\in CI_{1-\alpha}\right) \geq 1-\alpha .
One approach to constructing CIs is to consider a symmetric interval around the estimator \hat{\beta}_{1}: CI_{1-\alpha }=\left[ \hat{\beta}_{1}-c_{1-\alpha },\hat{\beta}_{1}+c_{1-\alpha }\right]
The problem is choosing c_{1-\alpha } such that P\left( \beta_{1}\in CI_{1-\alpha }\right) =1-\alpha .
In choosing c_{1-\alpha } we will be relying on the fact that given our assumptions and conditionally on \mathbf{X}: \hat{\beta}_{1}\sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1}\right)\right) \text{ and }\mathrm{Var}\left(\hat{\beta}_{1}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}.
Note that conditionally on \mathbf{X}: \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\sim N\left( 0,1\right) .
Let Z\sim N\left( 0,1\right) . The \tau-th quantile (percentile) of the standard normal distribution is z_{\tau } such that P\left( Z\leq z_{\tau }\right) =\tau .
Median: \tau =0.5 and z_{0.5}=0. (P\left( Z\leq 0\right) =0.5).
If \tau =0.975 then z_{0.975}=1.96. Due to symmetry, if \tau =0.025 then z_{0.025}=-1.96.
Suppose (for a moment) that \sigma ^{2} is known, and we can compute exactly the variance of \hat{\beta}_{1} as \mathrm{Var}\left(\hat{\beta}_{1}\right) =\sigma ^{2}/\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}.
Consider the following CI: CI_{1-\alpha }=\left[ \hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) },\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\right] .
For example, if 1-\alpha =0.95 \Longleftrightarrow \alpha =0.05\Longleftrightarrow z_{1-\alpha/2}=z_{0.975}=1.96, and CI_{0.95}=\left[ \hat{\beta}_{1}-1.96\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) },\hat{\beta}_{1}+1.96\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\right] .
We need to show that P\left( \beta _{1}\in \left[ \hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) },\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\right] \right)=1-\alpha .
Next, \begin{aligned} &\hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\leq \beta _{1}\leq \hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) } \\ &\Longleftrightarrow -z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\leq \beta _{1}-\hat{\beta}_{1}\leq z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) } \\ &\Longleftrightarrow -z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\leq \hat{\beta}_{1}-\beta _{1}\leq z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) } \\ &\Longleftrightarrow -z_{1-\alpha /2}\leq \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\leq z_{1-\alpha /2} \end{aligned}
We have that \begin{aligned} &\beta _{1}\in \left[ \hat{\beta}_{1}-z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) },\hat{\beta}_{1}+z_{1-\alpha /2}\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }\right] \\ &\Longleftrightarrow -z_{1-\alpha /2}\leq \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\leq z_{1-\alpha /2}. \end{aligned}
Let Z=\frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\sim N\left( 0,1\right). \begin{aligned} &P\left( -z_{1-\alpha /2}\leq \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\leq z_{1-\alpha /2}\right) \\ &=P\left( -z_{1-\alpha /2}\leq Z\leq z_{1-\alpha /2}\right) \\ &=P\left( z_{\alpha /2}\leq Z\leq z_{1-\alpha /2}\right) \\ &=1-\alpha /2-\alpha /2=1-\alpha . \end{aligned}
Since \sigma ^{2} is unknown, we must estimate it from the data: s^{2}=\frac{1}{n-2}\sum_{i=1}^{n}\hat{U}_{i}^{2}=\frac{1}{n-2}\sum_{i=1}^{n}\left( Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{i}\right) ^{2}.
We can replace \sigma ^{2} by s^{2}, however, the result does not have a normal distribution any more: \frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) }}\sim t_{n-2},\text{ where }\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) =\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. Here t_{n-2} denotes the t-distribution with n-2 degrees of freedom.
The degrees of freedom depend on
the sample size (n),
and the number of parameters one have to estimate to compute s^{2} (two in this case, \beta _{0} and \beta _{1}).
Let t_{df,\tau } be the \tau-th quantile of the t-distribution with the number of degrees of freedom df: If T\sim t_{df} then P\left( T\leq t_{df,\tau }\right) =\tau .
Similarly to the normal distribution, the t-distribution is centered at zero and is symmetric around zero: t_{n-2,1-\alpha /2}=-t_{n-2,\alpha/2}.
We can now construct a feasible confidence interval with 1-\alpha coverage as: \begin{aligned} CI_{1-\alpha } &=\left[ \hat{\beta}_{1}-t_{n-2,1-\alpha /2}\sqrt{\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) },\hat{\beta}_{1}+t_{n-2,1-\alpha /2}\sqrt{\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) }\right], \\ &\text{where }\widehat{\mathrm{Var}}\left( \hat{\beta}_{1}\right) =\frac{s^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}. \end{aligned}
Data (RENTAL.DTA): 64 cities in 1990, Rent = average rent, AvgInc = per capita income: Rent_{i}=\beta _{0}+\beta _{1}AvgInc_{i}+U_{i}.
t_{62,0.975}\approx 2.00 \Longrightarrow The 95% confidence interval for \beta _{1} is \left[ 0.0115-2\times 0.0013,0.0115+2\times 0.0013\right] = [0.0089,0.0141].
t_{62,0.95}\approx 1.671 \Longrightarrow The 90% confidence interval for \beta _{1} is \left[ 0.0115-1.671\times 0.0013,0.0115+1.671\times 0.0013\right] = [0.0093,0.0137].
The t-distribution has heavier tails than normal.
t_{df,1-\alpha /2}>z_{1-\alpha /2}, but as df increases t_{df,1-\alpha /2}\rightarrow z_{1-\alpha /2}.
When the sample size n is large, t_{n-2,1-\alpha /2} can be replaced with z_{1-\alpha /2}.
The confidence interval CI_{1-\alpha } is a function of the sample \left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots ,n\right\}, and therefore is random. This allows us to talk about probability of CI_{1-\alpha } containing the true value of \beta _{1}.
Once the confidence interval is computed given the data, we have its one realization. The realization of CI_{1-\alpha } or (computed confidence interval) is not random, and it does not make sense anymore to talk about the probability that it includes the true \beta _{1}.
Once the confidence interval is computed, it either contains the true value \beta _{1} or it does not.