Lecture 11: Properties of OLS in multiple regression

Economics 326 — Introduction to Econometrics II

Author

Vadim Marmer, UBC

Published

April 5, 2026

Multiple regression and OLS

Consider the multiple regression model with k regressors:

Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.
Let \hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k} be the OLS estimators: if

\hat{U}_{i}=Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{1,i}-\hat{\beta}_{2}X_{2,i}-\ldots -\hat{\beta}_{k}X_{k,i},

then

\sum_{i=1}^{n}\hat{U}_{i}=\sum_{i=1}^{n}X_{1,i}\hat{U}_{i}=\ldots =\sum_{i=1}^{n}X_{k,i}\hat{U}_{i}=0.

Multiple regression and OLS

As in Lecture 9, we can write \hat{\beta}_{1} as

\hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}},\text{ where}
- \tilde{X}_{1,i} are the fitted OLS residuals: \tilde{X}_{1,i}=X_{1,i}-\hat{\gamma}_{0}-\hat{\gamma}_{2}X_{2,i}-\ldots -\hat{\gamma}_{k}X_{k,i}.
- \hat{\gamma}_{0},\hat{\gamma}_{2},\ldots ,\hat{\gamma}_{k} are the OLS coefficients: \sum_{i=1}^{n}\tilde{X}_{1,i}=\sum_{i=1}^{n}\tilde{X}_{1,i}X_{2,i}=\ldots =\sum_{i=1}^{n}\tilde{X}_{1,i}X_{k,i}=0.
Similarly, we can write \hat{\beta}_{2} as

\hat{\beta}_{2}=\frac{\sum_{i=1}^{n}\tilde{X}_{2,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{2,i}^{2}},\text{ where}
- \tilde{X}_{2,i} are the fitted OLS residuals: \tilde{X}_{2,i}=X_{2,i}-\hat{\delta}_{0}-\hat{\delta}_{1}X_{1,i}-\hat{\delta}_{3}X_{3,i}-\ldots -\hat{\delta}_{k}X_{k,i}.
- \hat{\delta}_{0},\hat{\delta}_{1},\hat{\delta}_{3},\ldots ,\hat{\delta}_{k} are the OLS coefficients: \sum_{i=1}^{n}\tilde{X}_{2,i}=\sum_{i=1}^{n}\tilde{X}_{2,i}X_{1,i}=\sum_{i=1}^{n}\tilde{X}_{2,i}X_{3,i}=\ldots =\sum_{i=1}^{n}\tilde{X}_{2,i}X_{k,i}=0.

The OLS estimators are linear

Consider \hat{\beta}_{1}:

\hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}=\sum_{i=1}^{n}\frac{\tilde{X}_{1,i}}{\sum_{l=1}^{n}\tilde{X}_{1,l}^{2}}Y_{i}=\sum_{i=1}^{n}w_{1,i}Y_{i},

where

w_{1,i}=\frac{\tilde{X}_{1,i}}{\sum_{l=1}^{n}\tilde{X}_{1,l}^{2}}.
Recall that \tilde{X}_{1} are the residuals from a regression of X_{1} on X_{2},\ldots ,X_{k} and a constant, and therefore w_{1,i} depends only on \mathbf{X}.

Unbiasedness

Suppose that
1. Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.
2. Conditional on \mathbf{X}, \mathrm{E}\left[U_{i} \mid \mathbf{X}\right] = 0 for all i’s.
- Conditioning on \mathbf{X} means that we condition on all regressors for all observations: \mathbf{X} = \{(X_{1,i}, X_{2,i}, \ldots, X_{k,i}): i = 1, \ldots, n\}.
Under the above assumptions, conditional on \mathbf{X}:

\begin{aligned} \mathrm{E}\left[\hat{\beta}_{0} \mid \mathbf{X}\right] &= \beta _{0}, \\ \mathrm{E}\left[\hat{\beta}_{1} \mid \mathbf{X}\right] &= \beta _{1}, \\ &\;\;\vdots \\ \mathrm{E}\left[\hat{\beta}_{k} \mid \mathbf{X}\right] &= \beta _{k}. \end{aligned}

Proof of unbiasedness

Substituting Y_i and expanding:

\begin{aligned} \hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} &=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\left( \beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}\right) }{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \\ &=\beta _{0}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}+\beta _{1}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}X_{1,i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}+\beta _{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}X_{2,i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \\ &\quad +\ldots +\beta _{k}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}X_{k,i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}. \end{aligned}
Using the partitioned regression results from Lecture 9:

\begin{aligned} &\sum_{i=1}^{n}\tilde{X}_{1,i}=\sum_{i=1}^{n}\tilde{X}_{1,i}X_{2,i}=\ldots =\sum_{i=1}^{n}\tilde{X}_{1,i}X_{k,i}=0, \\ &\sum_{i=1}^{n}\tilde{X}_{1,i}X_{1,i}=\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}. \end{aligned}
Therefore,

\hat{\beta}_{1}=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}.

Proof of unbiasedness

We have

\hat{\beta}_{1}=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}.
Conditional on \mathbf{X},

\mathrm{E}\left[U_{i} \mid \mathbf{X}\right] = 0.
Therefore, conditional on \mathbf{X},

\begin{aligned} \mathrm{E}\left[\hat{\beta}_{1} \mid \mathbf{X}\right] &= \mathrm{E}\left[\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \mid \mathbf{X}\right] \\ &=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\mathrm{E}\left[U_{i} \mid \mathbf{X}\right]}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \\ &=\beta _{1}. \end{aligned}

Conditional variance of the OLS estimators

Suppose that:
1. Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.
2. Conditional on \mathbf{X}, \mathrm{E}\left[U_{i} \mid \mathbf{X}\right] = 0 for all i’s.
3. Conditional on \mathbf{X}, \mathrm{E}\left[U_{i}^{2} \mid \mathbf{X}\right] = \sigma ^{2} for all i’s.
4. Conditional on \mathbf{X}, \mathrm{E}\left[U_{i}U_{j} \mid \mathbf{X}\right] = 0 for all i\neq j.
Denote by SSR_{1}=\sum_{i=1}^{n}\tilde{X}_{1,i}^{2} the residual sum-of-squares from regressing X_{1} on a constant and the other regressors. The conditional variance of \hat{\beta}_{1} given \mathbf{X} is

\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}=\frac{\sigma ^{2}}{SSR_{1}}.
Gauss-Markov Theorem: Under Assumptions 1–4, the OLS estimators are BLUE.

Derivation of the conditional variance

We have \hat{\beta}_{1}=\beta _{1}+\dfrac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}.
Conditional on \mathbf{X},

\begin{aligned} \mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) &= \mathrm{E}\left[\left( \hat{\beta}_{1}-\mathrm{E}\left[\hat{\beta}_{1} \mid \mathbf{X}\right]\right) ^{2} \mid \mathbf{X}\right] \\ &= \mathrm{E}\left[\left( \frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}\right) ^{2} \mid \mathbf{X}\right] \\ &=\frac{1}{\left(\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}\right) ^{2}}\mathrm{E}\left[\left( \sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}\right) ^{2} \mid \mathbf{X}\right] \\ &=\frac{1}{\left(\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}\right) ^{2}}\left( \sum_{i=1}^{n}\tilde{X}_{1,i}^{2}\sigma ^{2}+\sum_{i\neq j}\tilde{X}_{1,i}\tilde{X}_{1,j}\cdot 0\right) \\ &=\frac{\sigma ^{2}\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}{\left(\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}\right)^{2}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}=\frac{\sigma ^{2}}{SSR_{1}}. \end{aligned}

Conditional covariance of the OLS estimators

Consider \hat{\beta}_{1} and \hat{\beta}_{2}:

\begin{aligned} \hat{\beta}_{1} &=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{SSR_{1}}, \\ \hat{\beta}_{2} &=\beta _{2}+\frac{\sum_{i=1}^{n}\tilde{X}_{2,i}U_{i}}{SSR_{2}}, \end{aligned}

where SSR_{2}=\sum_{i=1}^{n}\tilde{X}_{2,i}^{2} is the residual sum-of-squares from regressing X_{2} on a constant and X_{1},X_{3},\ldots ,X_{k}.
We will show that given Assumptions 1–4, conditional on \mathbf{X}:

\mathrm{Cov}\left(\hat{\beta}_{1},\hat{\beta}_{2} \mid \mathbf{X}\right) =\sigma ^{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}}{SSR_{1} \cdot SSR_{2}}

Conditional covariance of the OLS estimators

Conditional on \mathbf{X},

\begin{aligned} &\mathrm{Cov}\left(\hat{\beta}_{1},\hat{\beta}_{2} \mid \mathbf{X}\right) \\ &= \mathrm{E}\left[\left( \hat{\beta}_{1}-\mathrm{E}\left[\hat{\beta}_{1} \mid \mathbf{X}\right]\right) \left( \hat{\beta}_{2}-\mathrm{E}\left[\hat{\beta}_{2} \mid \mathbf{X}\right]\right) \mid \mathbf{X}\right] \\ &= \frac{1}{SSR_{1} \cdot SSR_{2}} \mathrm{E}\left[\left( \textstyle\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}\right) \left( \textstyle\sum_{i=1}^{n}\tilde{X}_{2,i}U_{i}\right) \mid \mathbf{X}\right] \\ &= \frac{1}{SSR_{1} \cdot SSR_{2}} \left(\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}\sigma^{2}+\sum_{i\neq j}\tilde{X}_{1,i}\tilde{X}_{2,j}\cdot 0\right) \\ &=\sigma ^{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}}{SSR_{1} \cdot SSR_{2}}. \end{aligned}

Normality of the OLS estimators

In addition to Assumptions 1–4, assume that conditional on \mathbf{X}, U_{i}’s are jointly normally distributed.
\hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k} are linear estimators:

\hat{\beta}_{j}=\sum_{i=1}^{n}w_{j,i}Y_{i}=\beta _{j}+\sum_{i=1}^{n}w_{j,i}U_{i},

where

w_{j,i}=\frac{\tilde{X}_{j,i}}{\sum_{l=1}^{n}\tilde{X}_{j,l}^{2}},

and \tilde{X}_{j,i} are the residuals from the regression of X_{j} on the rest of the regressors.
It follows that \hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k} are jointly normally distributed (conditional on \mathbf{X}).

Inclusion of irrelevant regressors: No bias

Suppose that the true model is Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+U_{i}.
We could estimate \beta _{1} by

\hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) ^{2}}.
Suppose that instead we regress Y on a constant, X_{1}, and k-1 additional regressors X_{2},\ldots ,X_{k}, i.e., we estimate \beta _{1} by

\tilde{\beta}_{1}=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}.
We have

\begin{aligned} \tilde{\beta}_{1} &=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\left( \beta _{0}+\beta _{1}X_{1,i}+U_{i}\right) }{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \\ &=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}. \end{aligned}
Since conditional on \mathbf{X}, \mathrm{E}\left[U_{i} \mid \mathbf{X}\right] = 0, \tilde{\beta}_{1} is unbiased!

Inclusion of irrelevant regressors: Variance inflation

When Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+U_{i},

\begin{aligned} \hat{\beta}_{1}&=\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) ^{2}} \quad \text{and} \\ \tilde{\beta}_{1}&=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \end{aligned}

are both unbiased.
Their variances, conditional on \mathbf{X}:

\begin{aligned} \mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) &=\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) ^{2}} \quad \text{and} \\ \mathrm{Var}\left(\tilde{\beta}_{1} \mid \mathbf{X}\right) &=\frac{\sigma ^{2}}{SSR_{1}}. \end{aligned}
In the short regression, X_{1} is regressed on a constant only, so SSR_{1}=\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) ^{2}.
In the long regression, X_{2},\ldots ,X_{k} are added. From Lecture 10, this cannot increase SSR_{1}, so

\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) \leq \mathrm{Var}\left(\tilde{\beta}_{1} \mid \mathbf{X}\right).
Including irrelevant regressors inflates the variance of \hat{\beta}_{1}.

Variance and the number of regressors

Recall the variance formula:

\mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) =\frac{\sigma ^{2}}{SSR_{1}}.
When we add a new regressor to the model, two quantities may change:
1. \sigma^{2} (the error variance): if the new regressor genuinely explains variation in Y, including it in the model moves that variation from U_{i} into the explained part, reducing \mathrm{Var}\left(U_{i} \mid \mathbf{X}\right)=\sigma^{2}
2. SSR_{1} (the variation in X_{1} net of the other regressors): can only decrease or stay the same; stays the same only when the new regressor is uncorrelated with X_{1}

Case A: Irrelevant regressor

Suppose the new regressor does not affect Y (its population coefficient is zero).
\sigma^{2} is unchanged, because the error variance is determined by the true data-generating process.
SSR_{1} decreases if the new regressor is correlated with X_{1}.
Net effect: \mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) increases.
Conclusion: do not include irrelevant regressors.

Case B: Relevant regressor, uncorrelated with X_{1}

Suppose the new regressor affects Y but is uncorrelated with X_{1} after controlling for the other regressors.
SSR_{1} is approximately unchanged: the new regressor has no additional predictive power for X_{1} beyond the existing regressors, so adding it to the auxiliary regression has a negligible effect on SSR_{1} in large samples.
\sigma^{2} decreases, because the new regressor explains part of the variation in Y.
Net effect: \mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) decreases.
Conclusion: always include such regressors.

Case C: Relevant regressor, correlated with X_{1}

Two opposing forces:
1. \sigma^{2} decreases (the new regressor explains variation in Y)
2. SSR_{1} decreases (the new regressor is correlated with X_{1})
Net effect on \mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) is ambiguous: both the numerator and the denominator of \sigma^{2}/SSR_{1} shrink, so the ratio could go either way.
However, omitting a relevant regressor introduces omitted variable bias (see Lecture 9).
Conclusion: always include relevant regressors, even if the variance of \hat{\beta}_{1} may increase. An unbiased estimator with larger variance is preferable to a biased one.

Estimation of variances and covariances

In Y_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\hat{\beta}_{2}X_{2,i}+\ldots +\hat{\beta}_{k}X_{k,i}+\hat{U}_{i},

\begin{aligned} \mathrm{Var}\left(\hat{\beta}_{1} \mid \mathbf{X}\right) &=\frac{\sigma ^{2}}{SSR_{1}}, \\ \mathrm{Cov}\left(\hat{\beta}_{1},\hat{\beta}_{2} \mid \mathbf{X}\right) &=\sigma ^{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}}{SSR_{1} \cdot SSR_{2}}. \end{aligned}
Variances and covariances can be estimated by replacing \sigma ^{2} with

s^{2}=\frac{1}{n-k-1}\sum_{i=1}^{n}\hat{U}_{i}^{2}.
Estimated variance and covariance:

\begin{aligned} \widehat{\mathrm{Var}}\left(\hat{\beta}_{1}\right) &=\frac{s^{2}}{SSR_{1}}, \\ \widehat{\mathrm{Cov}}\left(\hat{\beta}_{1},\hat{\beta}_{2}\right) &=s^{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}}{SSR_{1} \cdot SSR_{2}}. \end{aligned}

Standard errors in terms of R-squared

Auxiliary R^{2}: Let R_{1}^{2} be the R-squared from regressing X_{1} on a constant and X_{2},\ldots,X_{k}. By definition of R-squared,

SSR_{1} = SST_{1}(1 - R_{1}^{2}), \quad \text{where } SST_{1} = \sum_{i=1}^{n}(X_{1,i} - \bar{X}_{1})^{2}.
Adjusted R^{2}: From s^{2} = s_{Y}^{2}(1 - \bar{R}^{2}), where s_{Y}^{2} = SST/(n-1) is the sample variance of Y,

\begin{aligned} \mathrm{se}\left(\hat{\beta}_{1}\right) &= \sqrt{\frac{s^{2}}{SSR_{1}}} = \sqrt{\frac{s_{Y}^{2}(1 - \bar{R}^{2})}{SST_{1}(1 - R_{1}^{2})}}. \end{aligned}
Define s_{X_{1}} = \sqrt{SST_{1}/(n-1)}. Then

\mathrm{se}\left(\hat{\beta}_{1}\right) = \frac{s_{Y}}{s_{X_{1}}} \cdot \frac{1}{\sqrt{n - 1}} \cdot \sqrt{\frac{1 - \bar{R}^{2}}{1 - R_{1}^{2}}}.

Three factors behind standard errors

The SE formula has three interpretable factors:

\mathrm{se}\left(\hat{\beta}_{1}\right) = \underbrace{\frac{s_{Y}}{s_{X_{1}}}}_{\text{scaling}} \cdot \underbrace{\frac{1}{\sqrt{n - 1}}}_{\text{sample size}} \cdot \underbrace{\sqrt{\frac{1 - \bar{R}^{2}}{1 - R_{1}^{2}}}}_{\text{fit vs. collinearity}}
1. s_{Y}/s_{X_{1}} — scaling: the ratio of sample standard deviations of Y and X_{1}
2. 1/\sqrt{n - 1} — sample size effect; more data reduces SE
3. \sqrt{(1 - \bar{R}^{2})/(1 - R_{1}^{2})} — fit vs. multicollinearity trade-off:
  - (1 - \bar{R}^{2}): unexplained variation in Y (adjusted for degrees of freedom); higher \bar{R}^{2} reduces SE
  - (1 - R_{1}^{2}): unique variation in X_{1}; more collinearity (higher R_{1}^{2}) inflates SE

Connection to Cases A, B, C

The SE formula clarifies the three cases:
- Case A (irrelevant regressor): \bar{R}^{2} decreases, R_{1}^{2} may increase \Longrightarrow SE increases
- Case B (relevant, uncorrelated with X_{1}): \bar{R}^{2} increases, R_{1}^{2} \approx unchanged \Longrightarrow SE decreases
- Case C (relevant, correlated with X_{1}): \bar{R}^{2} increases but R_{1}^{2} also increases \Longrightarrow ambiguous
Remark. An equivalent expression uses the unadjusted R^{2}:

\mathrm{se}\left(\hat{\beta}_{1}\right) = \frac{s_{Y}}{s_{X_{1}}} \cdot \frac{1}{\sqrt{n - k - 1}} \cdot \sqrt{\frac{1 - R^{2}}{1 - R_{1}^{2}}}.

This follows from s^{2} = SST(1 - R^{2})/(n - k - 1), which keeps R^{2} and the degrees-of-freedom correction separate.

--- title: "Lecture 11: Properties of OLS in multiple regression" subtitle: "Economics 326 — Introduction to Econometrics II" author: - name: "Vadim Marmer, UBC" date: today date-format: "MMMM D, YYYY" format: html: output-file: 326_11_mreg_properties.html toc: true toc-depth: 3 toc-location: right toc-title: "Table of Contents" theme: cosmo smooth-scroll: true html-math-method: katex embed-resources: true pdf: output-file: 326_11_mreg_properties.pdf pdf-engine: xelatex geometry: margin=0.75in fontsize: 10pt number-sections: false toc: false classoption: fleqn revealjs: output-file: 326_11_mreg_properties_slides.html date: "" theme: solarized css: slides_no_caps.css smaller: true slide-number: c/t incremental: true html-math-method: katex scrollable: true chalkboard: false self-contained: true transition: none --- ## Multiple regression and OLS ::: {.hidden} \gdef\E#1{\mathrm{E}\left[#1\right]} \gdef\Var#1{\mathrm{Var}\left(#1\right)} \gdef\Cov#1{\mathrm{Cov}\left(#1\right)} \gdef\Vhat#1{\widehat{\mathrm{Var}}\left(#1\right)} \gdef\se#1{\mathrm{se}\left(#1\right)} ::: - Consider the multiple regression model with $k$ regressors: $$ Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}. $$ - Let $\hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}$ be the OLS estimators: if $$ \hat{U}_{i}=Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{1,i}-\hat{\beta}_{2}X_{2,i}-\ldots -\hat{\beta}_{k}X_{k,i}, $$ then $$ \sum_{i=1}^{n}\hat{U}_{i}=\sum_{i=1}^{n}X_{1,i}\hat{U}_{i}=\ldots =\sum_{i=1}^{n}X_{k,i}\hat{U}_{i}=0. $$ ## Multiple regression and OLS - As in Lecture 9, we can write $\hat{\beta}_{1}$ as $$ \hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}},\text{ where} $$ - $\tilde{X}_{1,i}$ are the fitted OLS residuals: $\tilde{X}_{1,i}=X_{1,i}-\hat{\gamma}_{0}-\hat{\gamma}_{2}X_{2,i}-\ldots -\hat{\gamma}_{k}X_{k,i}.$ - $\hat{\gamma}_{0},\hat{\gamma}_{2},\ldots ,\hat{\gamma}_{k}$ are the OLS coefficients: $\sum_{i=1}^{n}\tilde{X}_{1,i}=\sum_{i=1}^{n}\tilde{X}_{1,i}X_{2,i}=\ldots =\sum_{i=1}^{n}\tilde{X}_{1,i}X_{k,i}=0.$ - Similarly, we can write $\hat{\beta}_{2}$ as $$ \hat{\beta}_{2}=\frac{\sum_{i=1}^{n}\tilde{X}_{2,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{2,i}^{2}},\text{ where} $$ - $\tilde{X}_{2,i}$ are the fitted OLS residuals: $\tilde{X}_{2,i}=X_{2,i}-\hat{\delta}_{0}-\hat{\delta}_{1}X_{1,i}-\hat{\delta}_{3}X_{3,i}-\ldots -\hat{\delta}_{k}X_{k,i}.$ - $\hat{\delta}_{0},\hat{\delta}_{1},\hat{\delta}_{3},\ldots ,\hat{\delta}_{k}$ are the OLS coefficients: $\sum_{i=1}^{n}\tilde{X}_{2,i}=\sum_{i=1}^{n}\tilde{X}_{2,i}X_{1,i}=\sum_{i=1}^{n}\tilde{X}_{2,i}X_{3,i}=\ldots =\sum_{i=1}^{n}\tilde{X}_{2,i}X_{k,i}=0$. ## The OLS estimators are linear - Consider $\hat{\beta}_{1}$: $$ \hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}=\sum_{i=1}^{n}\frac{\tilde{X}_{1,i}}{\sum_{l=1}^{n}\tilde{X}_{1,l}^{2}}Y_{i}=\sum_{i=1}^{n}w_{1,i}Y_{i}, $$ where $$ w_{1,i}=\frac{\tilde{X}_{1,i}}{\sum_{l=1}^{n}\tilde{X}_{1,l}^{2}}. $$ - Recall that $\tilde{X}_{1}$ are the residuals from a regression of $X_{1}$ on $X_{2},\ldots ,X_{k}$ and a constant, and therefore $w_{1,i}$ depends only on $\mathbf{X}$. ## Unbiasedness - Suppose that 1. $Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.$ 2. **Conditional on $\mathbf{X}$**, $\E{U_{i} \mid \mathbf{X}} = 0$ for all $i$'s. - Conditioning on $\mathbf{X}$ means that we condition on all regressors for all observations: $\mathbf{X} = \{(X_{1,i}, X_{2,i}, \ldots, X_{k,i}): i = 1, \ldots, n\}.$ - Under the above assumptions, conditional on $\mathbf{X}$: $$\begin{aligned} \E{\hat{\beta}_{0} \mid \mathbf{X}} &= \beta _{0}, \\ \E{\hat{\beta}_{1} \mid \mathbf{X}} &= \beta _{1}, \\ &\;\;\vdots \\ \E{\hat{\beta}_{k} \mid \mathbf{X}} &= \beta _{k}. \end{aligned}$$ ## Proof of unbiasedness - Substituting $Y_i$ and expanding: $$\begin{aligned} \hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} &=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\left( \beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}\right) }{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \\ &=\beta _{0}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}+\beta _{1}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}X_{1,i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}+\beta _{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}X_{2,i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \\ &\quad +\ldots +\beta _{k}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}X_{k,i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}. \end{aligned}$$ - Using the partitioned regression results from Lecture 9: $$\begin{aligned} &\sum_{i=1}^{n}\tilde{X}_{1,i}=\sum_{i=1}^{n}\tilde{X}_{1,i}X_{2,i}=\ldots =\sum_{i=1}^{n}\tilde{X}_{1,i}X_{k,i}=0, \\ &\sum_{i=1}^{n}\tilde{X}_{1,i}X_{1,i}=\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}. \end{aligned}$$ - Therefore, $$ \hat{\beta}_{1}=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}. $$ ## Proof of unbiasedness - We have $$ \hat{\beta}_{1}=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}. $$ - **Conditional on $\mathbf{X}$**, $$ \E{U_{i} \mid \mathbf{X}} = 0. $$ - Therefore, **conditional on $\mathbf{X}$**, $$\begin{aligned} \E{\hat{\beta}_{1} \mid \mathbf{X}} &= \E{\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \mid \mathbf{X}} \\ &=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\E{U_{i} \mid \mathbf{X}}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \\ &=\beta _{1}. \end{aligned}$$ ## Conditional variance of the OLS estimators - Suppose that: 1. $Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.$ 2. **Conditional on $\mathbf{X}$**, $\E{U_{i} \mid \mathbf{X}} = 0$ for all $i$'s. 3. **Conditional on $\mathbf{X}$**, $\E{U_{i}^{2} \mid \mathbf{X}} = \sigma ^{2}$ for all $i$'s. 4. **Conditional on $\mathbf{X}$**, $\E{U_{i}U_{j} \mid \mathbf{X}} = 0$ for all $i\neq j$. - Denote by $SSR_{1}=\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}$ the residual sum-of-squares from regressing $X_{1}$ on a constant and the other regressors. The conditional variance of $\hat{\beta}_{1}$ given $\mathbf{X}$ is $$ \Var{\hat{\beta}_{1} \mid \mathbf{X}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}=\frac{\sigma ^{2}}{SSR_{1}}. $$ - **Gauss-Markov Theorem:** Under Assumptions 1--4, the OLS estimators are **BLUE**. ## Derivation of the conditional variance - We have $\hat{\beta}_{1}=\beta _{1}+\dfrac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}.$ - Conditional on $\mathbf{X}$, $$\begin{aligned} \Var{\hat{\beta}_{1} \mid \mathbf{X}} &= \E{\left( \hat{\beta}_{1}-\E{\hat{\beta}_{1} \mid \mathbf{X}}\right) ^{2} \mid \mathbf{X}} \\ &= \E{\left( \frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}\right) ^{2} \mid \mathbf{X}} \\ &=\frac{1}{\left(\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}\right) ^{2}}\E{\left( \sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}\right) ^{2} \mid \mathbf{X}} \\ &=\frac{1}{\left(\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}\right) ^{2}}\left( \sum_{i=1}^{n}\tilde{X}_{1,i}^{2}\sigma ^{2}+\sum_{i\neq j}\tilde{X}_{1,i}\tilde{X}_{1,j}\cdot 0\right) \\ &=\frac{\sigma ^{2}\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}{\left(\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}\right)^{2}} =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}=\frac{\sigma ^{2}}{SSR_{1}}. \end{aligned}$$ ## Conditional covariance of the OLS estimators - Consider $\hat{\beta}_{1}$ and $\hat{\beta}_{2}$: $$\begin{aligned} \hat{\beta}_{1} &=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{SSR_{1}}, \\ \hat{\beta}_{2} &=\beta _{2}+\frac{\sum_{i=1}^{n}\tilde{X}_{2,i}U_{i}}{SSR_{2}}, \end{aligned}$$ where $SSR_{2}=\sum_{i=1}^{n}\tilde{X}_{2,i}^{2}$ is the residual sum-of-squares from regressing $X_{2}$ on a constant and $X_{1},X_{3},\ldots ,X_{k}.$ - We will show that given Assumptions 1--4, **conditional on $\mathbf{X}$**: $$ \Cov{\hat{\beta}_{1},\hat{\beta}_{2} \mid \mathbf{X}} =\sigma ^{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}}{SSR_{1} \cdot SSR_{2}} $$ ## Conditional covariance of the OLS estimators Conditional on $\mathbf{X}$, $$\begin{aligned} &\Cov{\hat{\beta}_{1},\hat{\beta}_{2} \mid \mathbf{X}} \\ &= \E{\left( \hat{\beta}_{1}-\E{\hat{\beta}_{1} \mid \mathbf{X}}\right) \left( \hat{\beta}_{2}-\E{\hat{\beta}_{2} \mid \mathbf{X}}\right) \mid \mathbf{X}} \\ &= \frac{1}{SSR_{1} \cdot SSR_{2}} \E{\left( \textstyle\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}\right) \left( \textstyle\sum_{i=1}^{n}\tilde{X}_{2,i}U_{i}\right) \mid \mathbf{X}} \\ &= \frac{1}{SSR_{1} \cdot SSR_{2}} \left(\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}\sigma^{2}+\sum_{i\neq j}\tilde{X}_{1,i}\tilde{X}_{2,j}\cdot 0\right) \\ &=\sigma ^{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}}{SSR_{1} \cdot SSR_{2}}. \end{aligned}$$ ## Normality of the OLS estimators - In addition to Assumptions 1--4, assume that **conditional on $\mathbf{X}$**, $U_{i}$'s are jointly normally distributed. - $\hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}$ are linear estimators: $$ \hat{\beta}_{j}=\sum_{i=1}^{n}w_{j,i}Y_{i}=\beta _{j}+\sum_{i=1}^{n}w_{j,i}U_{i}, $$ where $$ w_{j,i}=\frac{\tilde{X}_{j,i}}{\sum_{l=1}^{n}\tilde{X}_{j,l}^{2}}, $$ and $\tilde{X}_{j,i}$ are the residuals from the regression of $X_{j}$ on the rest of the regressors. - It follows that $\hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}$ are jointly normally distributed (conditional on $\mathbf{X}$). ## Inclusion of irrelevant regressors: No bias - Suppose that the true model is $Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+U_{i}.$ - We could estimate $\beta _{1}$ by $$ \hat{\beta}_{1}=\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) ^{2}}. $$ - Suppose that instead we regress $Y$ on a constant, $X_{1},$ and $k-1$ additional regressors $X_{2},\ldots ,X_{k}$, i.e., we estimate $\beta _{1}$ by $$ \tilde{\beta}_{1}=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}. $$ - We have $$\begin{aligned} \tilde{\beta}_{1} &=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\left( \beta _{0}+\beta _{1}X_{1,i}+U_{i}\right) }{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \\ &=\beta _{1}+\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}U_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}}. \end{aligned}$$ - Since conditional on $\mathbf{X}$, $\E{U_{i} \mid \mathbf{X}} = 0$, $\tilde{\beta}_{1}$ is **unbiased**! ## Inclusion of irrelevant regressors: Variance inflation - When $Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+U_{i},$ $$\begin{aligned} \hat{\beta}_{1}&=\frac{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) Y_{i}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) ^{2}} \quad \text{and} \\ \tilde{\beta}_{1}&=\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}Y_{i}}{\sum_{i=1}^{n}\tilde{X}_{1,i}^{2}} \end{aligned}$$ are both unbiased. - Their variances, conditional on $\mathbf{X}$: $$\begin{aligned} \Var{\hat{\beta}_{1} \mid \mathbf{X}} &=\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) ^{2}} \quad \text{and} \\ \Var{\tilde{\beta}_{1} \mid \mathbf{X}} &=\frac{\sigma ^{2}}{SSR_{1}}. \end{aligned}$$ - In the short regression, $X_{1}$ is regressed on a constant only, so $SSR_{1}=\sum_{i=1}^{n}\left( X_{1,i}-\bar{X}_{1}\right) ^{2}.$ - In the long regression, $X_{2},\ldots ,X_{k}$ are added. From Lecture 10, this cannot increase $SSR_{1}$, so $$ \Var{\hat{\beta}_{1} \mid \mathbf{X}} \leq \Var{\tilde{\beta}_{1} \mid \mathbf{X}}. $$ - Including irrelevant regressors inflates the variance of $\hat{\beta}_{1}.$ ## Variance and the number of regressors - Recall the variance formula: $$ \Var{\hat{\beta}_{1} \mid \mathbf{X}} =\frac{\sigma ^{2}}{SSR_{1}}. $$ - When we add a new regressor to the model, two quantities may change: 1. $\sigma^{2}$ (the error variance): if the new regressor genuinely explains variation in $Y,$ including it in the model moves that variation from $U_{i}$ into the explained part, reducing $\Var{U_{i} \mid \mathbf{X}}=\sigma^{2}$ 2. $SSR_{1}$ (the variation in $X_{1}$ net of the other regressors): can only decrease or stay the same; stays the same only when the new regressor is uncorrelated with $X_{1}$ ## Case A: Irrelevant regressor - Suppose the new regressor does not affect $Y$ (its population coefficient is zero). - $\sigma^{2}$ is unchanged, because the error variance is determined by the true data-generating process. - $SSR_{1}$ decreases if the new regressor is correlated with $X_{1}.$ - Net effect: $\Var{\hat{\beta}_{1} \mid \mathbf{X}}$ **increases**. - **Conclusion:** do not include irrelevant regressors. ## Case B: Relevant regressor, uncorrelated with $X_{1}$ - Suppose the new regressor affects $Y$ but is uncorrelated with $X_{1}$ after controlling for the other regressors. - $SSR_{1}$ is approximately unchanged: the new regressor has no additional predictive power for $X_{1}$ beyond the existing regressors, so adding it to the auxiliary regression has a negligible effect on $SSR_{1}$ in large samples. - $\sigma^{2}$ decreases, because the new regressor explains part of the variation in $Y.$ - Net effect: $\Var{\hat{\beta}_{1} \mid \mathbf{X}}$ **decreases**. - **Conclusion:** always include such regressors. ## Case C: Relevant regressor, correlated with $X_{1}$ - Two opposing forces: 1. $\sigma^{2}$ decreases (the new regressor explains variation in $Y$) 2. $SSR_{1}$ decreases (the new regressor is correlated with $X_{1}$) - Net effect on $\Var{\hat{\beta}_{1} \mid \mathbf{X}}$ is **ambiguous**: both the numerator and the denominator of $\sigma^{2}/SSR_{1}$ shrink, so the ratio could go either way. - However, omitting a relevant regressor introduces **omitted variable bias** (see Lecture 9). - **Conclusion:** always include relevant regressors, even if the variance of $\hat{\beta}_{1}$ may increase. An unbiased estimator with larger variance is preferable to a biased one. ## Estimation of variances and covariances - In $Y_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\hat{\beta}_{2}X_{2,i}+\ldots +\hat{\beta}_{k}X_{k,i}+\hat{U}_{i},$ $$\begin{aligned} \Var{\hat{\beta}_{1} \mid \mathbf{X}} &=\frac{\sigma ^{2}}{SSR_{1}}, \\ \Cov{\hat{\beta}_{1},\hat{\beta}_{2} \mid \mathbf{X}} &=\sigma ^{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}}{SSR_{1} \cdot SSR_{2}}. \end{aligned}$$ - Variances and covariances can be estimated by replacing $\sigma ^{2}$ with $$ s^{2}=\frac{1}{n-k-1}\sum_{i=1}^{n}\hat{U}_{i}^{2}. $$ - Estimated variance and covariance: $$\begin{aligned} \Vhat{\hat{\beta}_{1}} &=\frac{s^{2}}{SSR_{1}}, \\ \widehat{\mathrm{Cov}}\left(\hat{\beta}_{1},\hat{\beta}_{2}\right) &=s^{2}\frac{\sum_{i=1}^{n}\tilde{X}_{1,i}\tilde{X}_{2,i}}{SSR_{1} \cdot SSR_{2}}. \end{aligned}$$ ## Standard errors in terms of R-squared - **Auxiliary $R^{2}$:** Let $R_{1}^{2}$ be the R-squared from regressing $X_{1}$ on a constant and $X_{2},\ldots,X_{k}$. By definition of R-squared, $$ SSR_{1} = SST_{1}(1 - R_{1}^{2}), \quad \text{where } SST_{1} = \sum_{i=1}^{n}(X_{1,i} - \bar{X}_{1})^{2}. $$ - **Adjusted $R^{2}$:** From $s^{2} = s_{Y}^{2}(1 - \bar{R}^{2})$, where $s_{Y}^{2} = SST/(n-1)$ is the sample variance of $Y$, $$\begin{aligned} \se{\hat{\beta}_{1}} &= \sqrt{\frac{s^{2}}{SSR_{1}}} = \sqrt{\frac{s_{Y}^{2}(1 - \bar{R}^{2})}{SST_{1}(1 - R_{1}^{2})}}. \end{aligned}$$ - Define $s_{X_{1}} = \sqrt{SST_{1}/(n-1)}$. Then $$ \se{\hat{\beta}_{1}} = \frac{s_{Y}}{s_{X_{1}}} \cdot \frac{1}{\sqrt{n - 1}} \cdot \sqrt{\frac{1 - \bar{R}^{2}}{1 - R_{1}^{2}}}. $$ ## Three factors behind standard errors - The SE formula has three interpretable factors: $$ \se{\hat{\beta}_{1}} = \underbrace{\frac{s_{Y}}{s_{X_{1}}}}_{\text{scaling}} \cdot \underbrace{\frac{1}{\sqrt{n - 1}}}_{\text{sample size}} \cdot \underbrace{\sqrt{\frac{1 - \bar{R}^{2}}{1 - R_{1}^{2}}}}_{\text{fit vs. collinearity}} $$ 1. $s_{Y}/s_{X_{1}}$ — scaling: the ratio of sample standard deviations of $Y$ and $X_{1}$ 2. $1/\sqrt{n - 1}$ — sample size effect; more data reduces SE 3. $\sqrt{(1 - \bar{R}^{2})/(1 - R_{1}^{2})}$ — fit vs. multicollinearity trade-off: - $(1 - \bar{R}^{2})$: unexplained variation in $Y$ (adjusted for degrees of freedom); higher $\bar{R}^{2}$ reduces SE - $(1 - R_{1}^{2})$: unique variation in $X_{1}$; more collinearity (higher $R_{1}^{2}$) inflates SE ## Connection to Cases A, B, C - The SE formula clarifies the three cases: - **Case A** (irrelevant regressor): $\bar{R}^{2}$ decreases, $R_{1}^{2}$ may increase $\Longrightarrow$ SE increases - **Case B** (relevant, uncorrelated with $X_{1}$): $\bar{R}^{2}$ increases, $R_{1}^{2} \approx$ unchanged $\Longrightarrow$ SE decreases - **Case C** (relevant, correlated with $X_{1}$): $\bar{R}^{2}$ increases but $R_{1}^{2}$ also increases $\Longrightarrow$ ambiguous - **Remark.** An equivalent expression uses the unadjusted $R^{2}$: $$ \se{\hat{\beta}_{1}} = \frac{s_{Y}}{s_{X_{1}}} \cdot \frac{1}{\sqrt{n - k - 1}} \cdot \sqrt{\frac{1 - R^{2}}{1 - R_{1}^{2}}}. $$ This follows from $s^{2} = SST(1 - R^{2})/(n - k - 1)$, which keeps $R^{2}$ and the degrees-of-freedom correction separate.