suppressMessages({
library(hdm)
library(AER)
data("AJR")
})

The original paper: D. Acemoglu, S. Johnson, J. A. Robinson (2001). “Colonial origins of comparative development: an empirical investigation.” American Economic Review, 91, 1369–1401.

Model

\[\begin{equation} \log (\text{GDP}_i) = \alpha\cdot \text{PropertyRights}_i+X_i'\beta+U_i. \end{equation}\]

First stage:

\[\begin{equation} \text{PropertyRights}_i = \pi_1\cdot \text{logMortality}_i+X_i'\Pi_2+V_i \end{equation}\]

Potential controls in \(X_i\):

2SLS estimation with all controls:

TSLS<-ivreg(log(GDP)~Exprop+Latitude+Latitude2+Africa+Asia+Namer+Samer+Neo
            | logMort+Latitude+Latitude2+Africa+Asia+Namer+Samer+Neo,data=AJR)
coeftest(TSLS,vcov. = vcovHC(TSLS,type="HC0"))
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.931620   0.813325  1.1454  0.25698  
## Exprop       0.186333   0.125776  1.4815  0.14419  
## Latitude     0.136689   0.365317  0.3742  0.70972  
## Latitude2   -0.326563   0.536654 -0.6085  0.54535  
## Africa      -0.044775   0.094473 -0.4739  0.63741  
## Asia        -0.191401   0.112574 -1.7002  0.09474 .
## Namer        0.032541   0.066610  0.4885  0.62712  
## Samer       -0.045516   0.061363 -0.7418  0.46139  
## Neo         -0.443098   0.376698 -1.1763  0.24455  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

First stage:

FS<-lm(Exprop~logMort+Latitude+Latitude2+Africa+Asia+Namer+Samer+Neo,data=AJR)
coeftest(FS,vcov. = vcovHC(FS,type="HC0"))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  7.38313    0.94101  7.8459 1.555e-10 ***
## logMort     -0.21410    0.16710 -1.2813    0.2055    
## Latitude    -0.94759    2.78426 -0.3403    0.7349    
## Latitude2    3.08143    3.74088  0.8237    0.4137    
## Africa      -0.32094    0.44153 -0.7269    0.4704    
## Asia         0.68685    0.53425  1.2856    0.2040    
## Namer       -0.21402    0.36735 -0.5826    0.5625    
## Samer        0.29918    0.42294  0.7074    0.4823    
## Neo          2.67028    0.45059  5.9262 2.117e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Lasso-based approach

We generate the vector/matrices for the dep. variable, main regressor, IV, potential controls

Y<-log(AJR$GDP)
X<-model.matrix(Exprop~Latitude+Latitude2+Africa+Asia+Namer+Samer+Neo,data=AJR)[,-1]
Z<-AJR$logMort
D<-AJR$Exprop

Lasso implementation:

rTSLS<-rlassoIV(y=Y,x=X,d=D,z=Z,select.X = TRUE,select.Z = FALSE)
summary(rTSLS)
## [1] "Estimation and significance testing of the effect of target variables in the IV regression model"
##     coeff.     se. t-value p-value  
## d1 0.11948 0.05705   2.094  0.0362 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step by step with Lasso selection (Many controls, few IVs)

Step 1: Controls and \(D\)

model.D<-rlasso(D~X)
model.D$index
##  Latitude Latitude2    Africa      Asia     Namer     Samer       Neo 
##     FALSE     FALSE     FALSE     FALSE     FALSE     FALSE      TRUE
  • Only Neo is selected

Residuals:

Dtilde<-model.D$residuals
  • The residuals in the rlasso() output are post-lasso!
  • rlasso() has the option for post-Lasso: post=TRUE (default) or post=FALSE.
coef(model.D)
## (Intercept)    Latitude   Latitude2      Africa        Asia       Namer 
##    6.304167    0.000000    0.000000    0.000000    0.000000    0.000000 
##       Samer         Neo 
##    0.000000    3.390833

Step 2: Controls and \(Y\)

model.Y<-rlasso(Y~X)
Ytilde<-model.Y$residuals
model.Y$index
##  Latitude Latitude2    Africa      Asia     Namer     Samer       Neo 
##     FALSE      TRUE      TRUE     FALSE     FALSE      TRUE      TRUE
  • Selected controls for \(Y\):
    • Latitude2
    • Africa
    • Samer
    • Neo

Step 3: Controls and \(Z\)

model.Z<-rlasso(Z~X)
Ztilde<-model.Z$residuals
model.Z$index
##  Latitude Latitude2    Africa      Asia     Namer     Samer       Neo 
##     FALSE     FALSE      TRUE     FALSE     FALSE     FALSE      TRUE
  • Selected controls for \(Z\):
  • Africa
  • Neo

Step 4: IV estimation using the residuals

Run without the intercept: all variables are in the residual form:

rTSLStilde<-ivreg(Ytilde ~ -1 +Dtilde | Ztilde)
coeftest(rTSLStilde,vcov.=vcovHC(rTSLStilde,type="HC0"))
## 
## t test of coefficients:
## 
##        Estimate Std. Error t value Pr(>|t|)  
## Dtilde 0.119478   0.057769  2.0682  0.04273 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1