Lecture 10: R-squared
Economics 326 — Introduction to Econometrics II
Fitted values
Consider the multiple regression model with k regressors: Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.
Let \hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k} be the OLS estimators.
The fitted (or predicted) value of Y is: \hat{Y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\hat{\beta}_{2}X_{2,i}+\ldots +\hat{\beta}_{k}X_{k,i}.
The residual is: \hat{U}_{i}=Y_{i}-\hat{Y}_{i}.
Consider the average of \hat{Y}:
\begin{aligned} \overline{\hat{Y}} &=\frac{1}{n}\sum_{i=1}^{n}\hat{Y}_{i} \\ &=\frac{1}{n}\sum_{i=1}^{n}\left( Y_{i}-\hat{U}_{i}\right) \\ &=\bar{Y}-\frac{1}{n}\sum_{i=1}^{n}\hat{U}_{i} =\bar{Y}, \end{aligned}
because when there is an intercept, \sum_{i=1}^{n}\hat{U}_{i}=0.
Sum-of-Squares
The total variation of Y in the sample is:
SST=\sum_{i=1}^{n}\left( Y_{i}-\bar{Y}\right) ^{2}\text{ (Total Sum-of-Squares).}
The explained variation of Y in the sample is:
SSE=\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) ^{2}\text{ (Explained or Model Sum-of-Squares).}
The residual (unexplained or error) variation of Y in the sample is:
SSR=\sum_{i=1}^{n}\hat{U}_{i}^{2}\text{ (Residual Sum-of-Squares).}
If the regression contains an intercept:
SST=SSE+SSR.
Proof of SST=SSE+SSR
First,
\begin{aligned} SST &=\sum_{i=1}^{n}\left( Y_{i}-\bar{Y}\right) ^{2} \\ &=\sum_{i=1}^{n}\left( \hat{Y}_{i}+\hat{U}_{i}-\bar{Y}\right) ^{2} \\ &=\sum_{i=1}^{n}\left( \left( \hat{Y}_{i}-\bar{Y}\right) +\hat{U}_{i}\right) ^{2} \\ &=\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) ^{2} +\sum_{i=1}^{n}\hat{U}_{i}^{2} +2\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i} \\ &=SSE+SSR+2\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i}. \end{aligned}
Next, we will show that \sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i}=0.
Proof of SST=SSE+SSR
Since \hat{Y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\ldots +\hat{\beta}_{k}X_{k,i},
\begin{aligned} &\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i} \\ &=\sum_{i=1}^{n}\left( \left( \hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\ldots +\hat{\beta}_{k}X_{k,i}\right) -\bar{Y}\right) \hat{U}_{i} \\ &=\hat{\beta}_{0}\sum_{i=1}^{n}\hat{U}_{i} +\hat{\beta}_{1}\sum_{i=1}^{n}X_{1,i}\hat{U}_{i}+\ldots +\hat{\beta}_{k}\sum_{i=1}^{n}X_{k,i}\hat{U}_{i} -\bar{Y}\sum_{i=1}^{n}\hat{U}_{i}. \end{aligned}
The OLS normal equations for a model with an intercept:
\sum_{i=1}^{n}\hat{U}_{i}=\sum_{i=1}^{n}X_{1,i}\hat{U}_{i}=\ldots =\sum_{i=1}^{n}X_{k,i}\hat{U}_{i}=0.
It follows that \sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) \hat{U}_{i}=0.
R^2
Consider the following measure of goodness of fit:
\begin{aligned} R^{2} &=\frac{\sum_{i=1}^{n}\left( \hat{Y}_{i}-\bar{Y}\right) ^{2}}{\sum_{i=1}^{n}\left( Y_{i}-\bar{Y}\right) ^{2}} \\ &=\frac{SSE}{SST} \\ &=1-\frac{SSR}{SST} \\ &=1-\frac{\sum_{i=1}^{n}\hat{U}_{i}^{2}}{\sum_{i=1}^{n}\left( Y_{i}-\bar{Y}\right) ^{2}}. \end{aligned}
0\leq R^{2}\leq 1.
R^{2} measures the proportion of variation in Y in the sample explained by the X’s.
R^2 is non-decreasing in regressors
Consider two models:
\begin{aligned} Y_{i} &=\tilde{\beta}_{0}+\tilde{\beta}_{1}X_{1,i}+\tilde{U}_{i}, \\ Y_{i} &=\hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\hat{\beta}_{2}X_{2,i}+\hat{U}_{i}. \end{aligned}
We will show that
\sum_{i=1}^{n}\tilde{U}_{i}^{2}\geq \sum_{i=1}^{n}\hat{U}_{i}^{2}
and therefore the R^{2} from the regression with one regressor is less than or equal to the R^{2} from the regression with two regressors.
This can be generalized to the case of k and k+1 regressors.
Proof
Consider
\sum_{i=1}^{n}\left( \tilde{U}_{i}-\hat{U}_{i}\right) ^{2}=\sum_{i=1}^{n}\tilde{U}_{i}^{2}+\sum_{i=1}^{n}\hat{U}_{i}^{2}-2\sum_{i=1}^{n}\tilde{U}_{i}\hat{U}_{i}.
We will show that
\sum_{i=1}^{n}\tilde{U}_{i}\hat{U}_{i}=\sum_{i=1}^{n}\hat{U}_{i}^{2}.
Then,
0\leq \sum_{i=1}^{n}\left( \tilde{U}_{i}-\hat{U}_{i}\right) ^{2}=\sum_{i=1}^{n}\tilde{U}_{i}^{2}-\sum_{i=1}^{n}\hat{U}_{i}^{2},
or
\sum_{i=1}^{n}\tilde{U}_{i}^{2}\geq \sum_{i=1}^{n}\hat{U}_{i}^{2}.
Proof
\begin{aligned} \sum_{i=1}^{n}\tilde{U}_{i}\hat{U}_{i} &=\sum_{i=1}^{n}\left( Y_{i}-\tilde{\beta}_{0}-\tilde{\beta}_{1}X_{1,i}\right) \hat{U}_{i} \\ &=\sum_{i=1}^{n}Y_{i}\hat{U}_{i}-\tilde{\beta}_{0}\sum_{i=1}^{n}\hat{U}_{i}-\tilde{\beta}_{1}\sum_{i=1}^{n}X_{1,i}\hat{U}_{i} \\ &=\sum_{i=1}^{n}Y_{i}\hat{U}_{i}-\tilde{\beta}_{0}\cdot 0-\tilde{\beta}_{1}\cdot 0 \\ &=\sum_{i=1}^{n}\left( \hat{\beta}_{0}+\hat{\beta}_{1}X_{1,i}+\hat{\beta}_{2}X_{2,i}+\hat{U}_{i}\right) \hat{U}_{i} \\ &=\sum_{i=1}^{n}\hat{U}_{i}^{2}. \end{aligned}
Adjusted R^2
Since R^{2} cannot decrease when more regressors are added, even if the additional regressors are irrelevant, an alternative measure of goodness-of-fit has been developed.
Adjusted R^{2}: the idea is to adjust SSR and SST for degrees of freedom:
\bar{R}^{2}=1-\frac{SSR/\left( n-k-1\right) }{SST/\left( n-1\right) }.
\bar{R}^{2}<R^{2}.
\bar{R}^{2} can decrease when more regressors are added.
Estimation of \sigma^2
In the multiple linear regression model, we can estimate \sigma ^{2}=\mathrm{E}\left[U_{i}^{2} \mid \mathbf{X}\right] as follows:
Let
\hat{U}_{i}=Y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}X_{1,i}-\hat{\beta}_{2}X_{2,i}-\ldots -\hat{\beta}_{k}X_{k,i}.
An estimator for \sigma ^{2} is
\begin{aligned} s^{2} &=\frac{1}{n-k-1}\sum_{i=1}^{n}\hat{U}_{i}^{2} \\ &=\frac{SSR}{n-k-1}. \end{aligned}
The adjustment k+1 is for the number of parameters we have to estimate in order to construct the \hat{U}’s:
\hat{\beta}_{0},\hat{\beta}_{1},\ldots ,\hat{\beta}_{k}.
Unbiasedness of s^2
s^{2}=\frac{1}{n-k-1}\sum_{i=1}^{n}\hat{U}_{i}^{2}.
s^{2} is an unbiased estimator of \sigma ^{2} (i.e., \mathrm{E}\left[s^{2} \mid \mathbf{X}\right]=\sigma ^{2}) if the following conditions hold:
Y_{i}=\beta _{0}+\beta _{1}X_{1,i}+\beta _{2}X_{2,i}+\ldots +\beta _{k}X_{k,i}+U_{i}.
Conditional on \mathbf{X}, \mathrm{E}\left[U_{i} \mid \mathbf{X}\right]=0 for all i’s.
Conditional on \mathbf{X}, \mathrm{E}\left[U_{i}^{2} \mid \mathbf{X}\right]=\sigma ^{2} for all i’s (homoskedasticity).
Conditional on \mathbf{X}, \mathrm{E}\left[U_{i}U_{j} \mid \mathbf{X}\right]=0 for all i\neq j.
R example
Using the
hprice1dataset from thewooldridgepackage, we regress house price on square footage, number of bedrooms, and lot size (n = 88, k = 3):library(wooldridge) m <- lm(price ~ sqrft + bdrms + lotsize, data = hprice1) summary(m)The
summary()output reports:Residual standard error: 59.83 on 84 degrees of freedom Multiple R-squared: 0.6724, Adjusted R-squared: 0.6607From here we can read off R^{2} = 0.6724, \bar{R}^{2} = 0.6607, and s = 59.83.
The residual degrees of freedom is n - k - 1 = 88 - 3 - 1 = 84.
Recovering SSR, SST, SSE from R output
Since s = 59.83, we have s^{2} = 59.83^{2} \approx 3{,}580.
SSR = s^{2}\cdot(n-k-1) \approx 3{,}580 \times 84 \approx 300{,}720.
From R^{2} = 1 - SSR/SST:
SST = \frac{SSR}{1-R^{2}} \approx \frac{300{,}720}{1-0.6724} = \frac{300{,}720}{0.3276} \approx 918{,}100.
SSE = SST - SSR \approx 918{,}100 - 300{,}720 = 617{,}380.