suppressMessages({
  library(stargazer)
  library(hdm)
  })

Model

\[\begin{equation} (\Delta\log GDP)_i=\alpha \cdot GDP^0_i+U_i. \end{equation}\]

Data

data("GrowthData")
  • Initial GDP per capita: gdpsh465.
  • GDP per capita growth rate: Outcome.

Testing the hypothesis

simple<-lm(Outcome~gdpsh465,data=GrowthData)
suppressWarnings(
  stargazer(simple,
            header=FALSE, 
            title="Testing the simple catching up hypothesis",
            omit.stat = "all",
            #type="text"
            type="html",notes.append = FALSE,notes = c("<sup>&sstarf;</sup>p<0.1; <sup>&sstarf;&sstarf;</sup>p<0.05; <sup>&sstarf;&sstarf;&sstarf;</sup>p<0.01")
  )
)
Testing the simple catching up hypothesis
Dependent variable:
Outcome
gdpsh465 0.001
(0.006)
Constant 0.035
(0.047)
Note: p<0.1; ⋆⋆p<0.05; ⋆⋆⋆p<0.01
No support for the catching up hypothesis.

Conditional model:

There are a lot of potential controls in the data:

dim(GrowthData)
## [1] 90 63
names(GrowthData)
##  [1] "Outcome"   "intercept" "gdpsh465"  "bmp1l"     "freeop"    "freetar"  
##  [7] "h65"       "hm65"      "hf65"      "p65"       "pm65"      "pf65"     
## [13] "s65"       "sm65"      "sf65"      "fert65"    "mort65"    "lifee065" 
## [19] "gpop1"     "fert1"     "mort1"     "invsh41"   "geetot1"   "geerec1"  
## [25] "gde1"      "govwb1"    "govsh41"   "gvxdxe41"  "high65"    "highm65"  
## [31] "highf65"   "highc65"   "highcm65"  "highcf65"  "human65"   "humanm65" 
## [37] "humanf65"  "hyr65"     "hyrm65"    "hyrf65"    "no65"      "nom65"    
## [43] "nof65"     "pinstab1"  "pop65"     "worker65"  "pop1565"   "pop6565"  
## [49] "sec65"     "secm65"    "secf65"    "secc65"    "seccm65"   "seccf65"  
## [55] "syr65"     "syrm65"    "syrf65"    "teapri65"  "teasec65"  "ex1"      
## [61] "im1"       "xr65"      "tot1"

Estimation of the conditional model

y=as.vector(GrowthData$Outcome)
D=as.vector(GrowthData$gdpsh465)
Controls=as.matrix(GrowthData)[,-c(1,2,3)]
  • y = GDP per capita growth rate.
  • D = initial GDP per capita.
  • -c(1,2,3) instructs to exclude the first 3 variables in GrowthData:
    • Outcome
    • intercept
    • gdpsh465

OLS regression with all controls:

conditional=lm(y~D+Controls)
suppressWarnings(
  stargazer(conditional,
            header=FALSE, 
            title="Testing the conditional catching up hypothesis",
            omit.stat = "all",
            omit="Controls",
            #type="text"
            type="html",notes.append = FALSE,notes = c("<sup>&sstarf;</sup>p<0.1; <sup>&sstarf;&sstarf;</sup>p<0.05; <sup>&sstarf;&sstarf;&sstarf;</sup>p<0.01")
  )
)
Testing the conditional catching up hypothesis
Dependent variable:
y
D -0.009
(0.030)
Constant 0.247
(0.785)
Note: p<0.1; ⋆⋆p<0.05; ⋆⋆⋆p<0.01
No support for the conditional catching up hypothesis.
  • The estimate is negative but the std.err is too large - too many controls, still no support.
  • The std.err. on the initial GDP per capita increased from 0.006 to 0.030.

Post-Lasso with Double Lasso

?rlassoEffect

Usage:

Effect<-rlassoEffect(x=Controls,y=y,d=D,method="double selection")
summary(Effect)
## [1] "Estimates and significance testing of the effect of target variables"
##    Estimate. Std. Error t value Pr(>|t|)   
## d1  -0.05001    0.01579  -3.167  0.00154 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
names(Effect)
##  [1] "alpha"            "se"               "t"                "pval"            
##  [5] "no.selected"      "coefficients"     "coefficient"      "coefficients.reg"
##  [9] "selection.index"  "residuals"        "call"             "samplesize"
A negative significant estimate!

Included controls:

sum(Effect$selection.index==TRUE)
## [1] 7
Effect$selection.index[Effect$selection.index==TRUE]
##    bmp1l  freetar     hm65     sf65 lifee065 humanf65  pop6565 
##     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE

Double Lasso selected 7 controls:

  • bmp1l: Log of the black market premium.
  • freetar: Measure of tariff restrictions.
  • hm65: Male gross enrollment ratio for higher education in 1965.
  • sf65: Female gross enrollment ratio for secondary education in 1965.
  • lifee065: Life expectancy at 0 in 1965.
  • humanf65: Average schooling years in the female population over age 25 in 1965.
  • pop6565: Population Proportion over 65 in 1965.

Using the partialling out approach:

Effect_PO<-rlassoEffect(x=Controls,y=y,d=D,method="partialling out")
summary(Effect_PO)
## [1] "Estimates and significance testing of the effect of target variables"
##      Estimate. Std. Error t value Pr(>|t|)    
## [1,]  -0.04981    0.01394  -3.574 0.000351 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • A very similar estimate to the Double Lasso approach.
sum(Effect_PO$selection.index==TRUE)
## [1] 7
Effect_PO$selection.index[Effect_PO$selection.index==TRUE]
##    bmp1l  freetar     hm65     sf65 lifee065 humanf65  pop6565 
##     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE
  • The same selected controls.

Conclusion

This data set contains many potential controls that are highly correlated among each other:

  • Many similar education variables.
  • Many similar demographic variables.
  • Many similar political variables.
  • Etc.

These controls are related not only among each other, but also to the main regressor (treatment): The “initial” GDP per capita. As a result, including all potential controls produces insignificant estimates due to the presence of many controls (over 60 controls with only 90 observations).

It is plausible to assume that the model is sparse: only certain demographic, education, and etc. variables matter. This is an appropriate problem for Lasso as we need to select few out of many controls.

Lasso selects the important controls. The double Lasso step also selects the controls that are related to the main regressor to avoid potential omitted variables bias.

Post Lasso produces significant estimates on the main regressor. The result implies that the conditional catching up hypothesis holds: growth rates converge for countries with similar economic and demographic characteristics.