Lecture 10: R-squared

Economics 326 — Introduction to Econometrics II

Vadim Marmer, UBC

Fitted values

Consider the multiple regression model with k regressors: Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.
Let \hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k} be the OLS estimators.
The fitted (or predicted) value of Y is: \hat{Y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\hat{\beta}_{2}X_{2,i}+\ldots +\hat{\beta}_{k}X_{k,i}.
The residual is: \hat{U}_{i}=Y_{i}-\hat{Y}_{i}.
Consider the average of \hat{Y}:

\begin{aligned} \overline{\hat{Y}} &=\frac{1}{n}\sum_{i=1}^{n}\hat{Y}_{i} \\ &=\frac{1}{n}\sum_{i=1}^{n}\left( Y_{i}-\hat{U}_{i}\right) \\ &=\bar{Y}-\frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i} =\bar{Y}, \end{aligned}

because when there is an intercept, \sum_{i=1}^{n}\hat{U}_{i}=0.

The total variation of Y in the sample is:

SST=\sum_{i=1}^{n}\left( Y_{i}-\bar{Y}\right) ^{2}\text{ (Total Sum-of-Squares).}
The explained variation of Y in the sample is:

SSE=\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) ^{2}\text{ (Explained or Model Sum-of-Squares).}
The residual (unexplained or error) variation of Y in the sample is:

SSR=\sum_{i=1}^{n}\hat{U}_{i}^{2}\text{ (Residual Sum-of-Squares).}
If the regression contains an intercept:

SST=SSE+SSR.

First,

\begin{aligned} SST &=\sum_{i=1}^{n}\left( Y_{i}-\bar{Y}\right) ^{2} \\ &=\sum_{i=1}^{n}\left( \hat{Y}_{i}+\hat{U}_{i}-\bar{Y}\right) ^{2} \\ &=\sum_{i=1}^{n}\left( \left( \hat{Y}_{i}-\bar{Y}\right) +\hat{U}_{i}\right) ^{2} \\ &=\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) ^{2} +\sum_{i=1}^{n}\hat{U}_{i}^{2} +2\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i} \\ &=SSE+SSR+2\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i}. \end{aligned}
Next, we will show that \sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i}=0.

Since \hat{Y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\ldots +\hat{\beta}_{k}X_{k,i},

\begin{aligned} &\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i} \\ &=\sum_{i=1}^{n}\left( \left( \hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\ldots +\hat{\beta}_{k}X_{k,i}\right) -\bar{Y}\right) \hat{U}_{i} \\ &=\hat{\beta}_{0}\sum_{i=1}^{n}\hat{U}_{i} +\hat{\beta}_{1}\sum_{i=1}^{n}X_{1,i}\hat{U}_{i}+\ldots +\hat{\beta}_{k}\sum_{i=1}^{n}X_{k,i}\hat{U}_{i} -\bar{Y}\sum_{i=1}^{n}\hat{U}_{i}. \end{aligned}
The OLS normal equations for a model with an intercept:

\sum_{i=1}^{n}\hat{U}_{i}=\sum_{i=1}^{n}X_{1,i}\hat{U}_{i}=\ldots =\sum_{i=1}^{n}X_{k,i}\hat{U}_{i}=0.
It follows that \sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i}=0.

Consider the following measure of goodness of fit:

\begin{aligned} R^{2} &=\frac{\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) ^{2}}{\sum_{i=1}^{n}\left( Y_{i}-\bar{Y}\right) ^{2}} \\ &=\frac{SSE}{SST} \\ &=1-\frac{SSR}{SST} \\ &=1-\frac{\sum_{i=1}^{n}\hat{U}_{i}^{2}}{\sum_{i=1}^{n}\left( Y_{i}-\bar{Y}\right) ^{2}}. \end{aligned}
0\leq R^{2}\leq 1.
R^{2} measures the proportion of variation in Y in the sample explained by the X’s.

Consider two models:

\begin{aligned} Y_{i} &=\tilde{\beta}_{0}+\tilde{\beta}_{1}X_{1,i}+\tilde{U}_{i}, \\ Y_{i} &=\hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\hat{\beta}_{2}X_{2,i}+\hat{U}_{i}. \end{aligned}
Claim: Adding a regressor cannot increase SSR:

\sum_{i=1}^{n}\tilde{U}_{i}^{2}\geq \sum_{i=1}^{n}\hat{U}_{i}^{2}.
This generalizes to the case of k and k+1 regressors.

By definition, OLS minimizes the sum of squared residuals.
The short regression solves

\min_{b_{0},\, b_{1}}\;\sum_{i=1}^{n}\left( Y_{i}-b_{0}-b_{1}X_{1,i}\right) ^{2},

which is the same as minimizing \sum_{i=1}^{n}\left( Y_{i}-b_{0}-b_{1}X_{1,i}-b_{2}X_{2,i}\right) ^{2} subject to the constraint b_{2}=0.
The long regression minimizes the same sum of squares without the constraint b_{2}=0.
The minimum over a larger set (unconstrained) cannot exceed the minimum over a smaller set (constrained):

\underbrace{\sum_{i=1}^{n}\hat{U}_{i}^{2}}_{\text{unconstrained min}} \;\leq\; \underbrace{\sum_{i=1}^{n}\tilde{U}_{i}^{2}}_{\text{constrained min}}.

Since SST = \sum_{i=1}^{n}(Y_{i}-\bar{Y})^{2} does not depend on the regressors, and SSR cannot increase when regressors are added:

R^{2} = 1-\frac{SSR}{SST}

cannot decrease when more regressors are added, even if the additional regressors are irrelevant.

Since R^{2} cannot decrease when more regressors are added, even if the additional regressors are irrelevant, an alternative measure of goodness-of-fit has been developed.
Adjusted R^{2}: the idea is to adjust SSR and SST for degrees of freedom:

\bar{R}^{2}=1-\frac{SSR/\left( n-k-1\right) }{SST/\left( n-1\right) }.
\bar{R}^{2}<R^{2}.
\bar{R}^{2} can decrease when more regressors are added.

In the multiple linear regression model, we can estimate \sigma ^{2}=\mathrm{E}\left[U_{i}^{2} \mid \mathbf{X}\right] as follows:

Let

\hat{U}_{i}=Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{1,i}-\hat{\beta}_{2}X_{2,i}-\ldots -\hat{\beta}_{k}X_{k,i}.
An estimator for \sigma ^{2} is

\begin{aligned} s^{2} &=\frac{1}{n-k-1}\sum_{i=1}^{n}\hat{U}_{i}^{2} \\ &=\frac{SSR}{n-k-1}. \end{aligned}
The adjustment k+1 is for the number of parameters we have to estimate in order to construct the \hat{U}’s:

\hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}.

s^{2}=\frac{1}{n-k-1}\sum_{i=1}^{n}\hat{U}_{i}^{2}.

s^{2} is an unbiased estimator of \sigma ^{2} (i.e., \mathrm{E}\left[s^{2} \mid \mathbf{X}\right]=\sigma ^{2}) if the following conditions hold:
1. Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.
2. Conditional on \mathbf{X}, \mathrm{E}\left[U_{i} \mid \mathbf{X}\right]=0 for all i’s.
3. Conditional on \mathbf{X}, \mathrm{E}\left[U_{i}^{2} \mid \mathbf{X}\right]=\sigma ^{2} for all i’s (homoskedasticity).
4. Conditional on \mathbf{X}, \mathrm{E}\left[U_{i}U_{j} \mid \mathbf{X}\right]=0 for all i\neq j.

The summary() output reports:

Residual standard error: 59.83 on 84 degrees of freedom
Multiple R-squared:  0.6724, Adjusted R-squared:  0.6607

Since s = 59.83, we have s^{2} = 59.83^{2} \approx 3{,}580.
SSR = s^{2}\cdot(n-k-1) \approx 3{,}580 \times 84 \approx 300{,}720.
From R^{2} = 1 - SSR/SST:

SST = \frac{SSR}{1-R^{2}} \approx \frac{300{,}720}{1-0.6724} = \frac{300{,}720}{0.3276} \approx 918{,}100.
SSE = SST - SSR \approx 918{,}100 - 300{,}720 = 617{,}380.