suppressMessages({
library(stargazer)
library(hdm)
})
\[\begin{equation} (\Delta\log GDP)_i=\alpha \cdot GDP^0_i+U_i. \end{equation}\]
data("GrowthData")
gdpsh465
.Outcome
.simple<-lm(Outcome~gdpsh465,data=GrowthData)
suppressWarnings(
stargazer(simple,
header=FALSE,
title="Testing the simple catching up hypothesis",
omit.stat = "all",
#type="text"
type="html",notes.append = FALSE,notes = c("<sup>⋆</sup>p<0.1; <sup>⋆⋆</sup>p<0.05; <sup>⋆⋆⋆</sup>p<0.01")
)
)
Dependent variable: | |
Outcome | |
gdpsh465 | 0.001 |
(0.006) | |
Constant | 0.035 |
(0.047) | |
Note: | ⋆p<0.1; ⋆⋆p<0.05; ⋆⋆⋆p<0.01 |
There are a lot of potential controls in the data:
dim(GrowthData)
## [1] 90 63
names(GrowthData)
## [1] "Outcome" "intercept" "gdpsh465" "bmp1l" "freeop" "freetar"
## [7] "h65" "hm65" "hf65" "p65" "pm65" "pf65"
## [13] "s65" "sm65" "sf65" "fert65" "mort65" "lifee065"
## [19] "gpop1" "fert1" "mort1" "invsh41" "geetot1" "geerec1"
## [25] "gde1" "govwb1" "govsh41" "gvxdxe41" "high65" "highm65"
## [31] "highf65" "highc65" "highcm65" "highcf65" "human65" "humanm65"
## [37] "humanf65" "hyr65" "hyrm65" "hyrf65" "no65" "nom65"
## [43] "nof65" "pinstab1" "pop65" "worker65" "pop1565" "pop6565"
## [49] "sec65" "secm65" "secf65" "secc65" "seccm65" "seccf65"
## [55] "syr65" "syrm65" "syrf65" "teapri65" "teasec65" "ex1"
## [61] "im1" "xr65" "tot1"
y=as.vector(GrowthData$Outcome)
D=as.vector(GrowthData$gdpsh465)
Controls=as.matrix(GrowthData)[,-c(1,2,3)]
y
= GDP per capita growth rate.D
= initial GDP per capita.-c(1,2,3)
instructs to exclude the first 3 variables in
GrowthData
:
Outcome
intercept
gdpsh465
OLS regression with all controls:
conditional=lm(y~D+Controls)
suppressWarnings(
stargazer(conditional,
header=FALSE,
title="Testing the conditional catching up hypothesis",
omit.stat = "all",
omit="Controls",
#type="text"
type="html",notes.append = FALSE,notes = c("<sup>⋆</sup>p<0.1; <sup>⋆⋆</sup>p<0.05; <sup>⋆⋆⋆</sup>p<0.01")
)
)
Dependent variable: | |
y | |
D | -0.009 |
(0.030) | |
Constant | 0.247 |
(0.785) | |
Note: | ⋆p<0.1; ⋆⋆p<0.05; ⋆⋆⋆p<0.01 |
0.006
to 0.030
.?rlassoEffect
Usage:
x=
specifies the matrix of controls.y=
specifies the outcome variable.d=
specifies the treatment variable (the main regressor
of interest).Effect<-rlassoEffect(x=Controls,y=y,d=D,method="double selection")
summary(Effect)
## [1] "Estimates and significance testing of the effect of target variables"
## Estimate. Std. Error t value Pr(>|t|)
## d1 -0.05001 0.01579 -3.167 0.00154 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
names(Effect)
## [1] "alpha" "se" "t" "pval"
## [5] "no.selected" "coefficients" "coefficient" "coefficients.reg"
## [9] "selection.index" "residuals" "call" "samplesize"
Included controls:
sum(Effect$selection.index==TRUE)
## [1] 7
Effect$selection.index[Effect$selection.index==TRUE]
## bmp1l freetar hm65 sf65 lifee065 humanf65 pop6565
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
bmp1l
: Log of the black market premium.freetar
: Measure of tariff restrictions.hm65
: Male gross enrollment ratio for higher education
in 1965.sf65
: Female gross enrollment ratio for secondary
education in 1965.lifee065
: Life expectancy at 0 in 1965.humanf65
: Average schooling years in the female
population over age 25 in 1965.pop6565
: Population Proportion over 65 in 1965.Effect_PO<-rlassoEffect(x=Controls,y=y,d=D,method="partialling out")
summary(Effect_PO)
## [1] "Estimates and significance testing of the effect of target variables"
## Estimate. Std. Error t value Pr(>|t|)
## [1,] -0.04981 0.01394 -3.574 0.000351 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sum(Effect_PO$selection.index==TRUE)
## [1] 7
Effect_PO$selection.index[Effect_PO$selection.index==TRUE]
## bmp1l freetar hm65 sf65 lifee065 humanf65 pop6565
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
This data set contains many potential controls that are highly correlated among each other:
These controls are related not only among each other, but also to the main regressor (treatment): The “initial” GDP per capita. As a result, including all potential controls produces insignificant estimates due to the presence of many controls (over 60 controls with only 90 observations).
It is plausible to assume that the model is sparse: only certain demographic, education, and etc. variables matter. This is an appropriate problem for Lasso as we need to select few out of many controls.
Lasso selects the important controls. The double Lasso step also selects the controls that are related to the main regressor to avoid potential omitted variables bias.
Post Lasso produces significant estimates on the main regressor. The result implies that the conditional catching up hypothesis holds: growth rates converge for countries with similar economic and demographic characteristics.