Empirical Economics

Tutorial 8: Instrumental Variables

Tutorial 8

Course Evaluations

We would really like you to take the time to fill out the course evaluation form. Your feedback is very important to us and helps us improve the course for future students.

You can fill it in on https://entry.caracal.uu.nl/44007 under “Course Evaluations” or scan the QR code:

Recapitulation of the Lecture

Endogeneity

Our goal is to estimate the causal effect of a variable \(X\) on an outcome \(Y\) using a linear model: \(Y_i = \beta_0 + \beta_1 X_i + u_i\)

For the Ordinary Least Squares (OLS) estimate of \(\beta_1\) to be unbiased, we must assume that \(X\) is uncorrelated with the error term \(u_i\), a condition known as exogeneity (\(Cov(X_i, u_i) = 0\)).

Endogeneity occurs when this assumption is violated (\(Cov(X_i, u_i) \neq 0\)), making OLS estimates biased and inconsistent. The main sources of endogeneity are:

Omitted Variable Bias (OVB): An unobserved variable (e.g., “ability”) affects both the outcome (wages) and the regressor (education).
Simultaneity / Reverse Causality: While \(X\) causes \(Y\), \(Y\) also causes \(X\) (e.g., police presence and crime rates).
Measurement Error: The variable \(X\) is measured with error, which can attenuate the estimated effect towards zero.

A Solution: Instrumental Variables (IV)

The IV strategy uses a third variable, the instrument (\(Z\)), to isolate the part of the variation in \(X\) that is not correlated with the error term \(u_i\). A valid instrument must satisfy two core conditions:

Relevance: The instrument must be correlated with the endogenous variable \(X\): \(Cov(Z_i, X_i) \neq 0\)

This is a testable condition, often assessed in the “first stage” of the analysis.
Exclusion Restriction: The instrument must be uncorrelated with the error term \(u_i\). This means \(Z\) can only affect the outcome \(Y\) through its effect on \(X\): \(Cov(Z_i, u_i) = 0\)

This is a theoretical assumption that cannot be proven with data and requires a strong justification.

IV Estimation and Interpretation

Unlike OLS, which aims to estimate an average effect for the entire population, IV provides a more specific estimate. With a binary instrument, the population can be divided into four groups based on their response to the instrument: Compliers, Always-Takers, Never-Takers, and Defiers.

Assuming monotonicity (ruling out “defiers”), the IV estimator does not recover the Average Treatment Effect (ATE) for the whole population.

Instead, it identifies the Local Average Treatment Effect (LATE), which is the average causal effect specifically for the subpopulation of compliers—those individuals whose treatment status is actually changed by the instrument. \[ \beta_{IV} = E[Y_i(1) - Y_i(0) \mid \text{Compliers}] \]

Wald and 2SLS

The IV estimate can be calculated in several ways:

Wald Estimator: Used when both the instrument \(Z\) and treatment \(X\) are binary. It is the ratio of the reduced-form effect (the effect of \(Z\) on \(Y\)) to the first-stage effect (the effect of \(Z\) on \(X\)). \[ \hat{\beta}_{\text{Wald}} = \frac{E[Y | Z=1] - E[Y | Z=0]}{E[X | Z=1] - E[X | Z=0]} \]
Two-Stage Least Squares (2SLS): The general method, especially with multiple instruments or control variables.
1. First Stage: Regress the endogenous variable \(X\) on the instrument(s) \(Z\) and any other exogenous controls. Obtain the predicted values, \(\hat{X}\). \[ X_i = \pi_0 + \pi_1 Z_{1i} + ... + \delta' W_i + \nu_i \rightarrow \text{get } \hat{X}_i \]
2. Second Stage: Regress the outcome \(Y\) on the predicted values \(\hat{X}\) and the other controls. The coefficient on \(\hat{X}\) is the consistent IV estimate of \(\beta_1\). \[ Y_i = \beta_0 + \beta_1 \hat{X}_i + \gamma' W_i + \zeta_i \]

Practical challenges:

Finding a Valid Instrument: The credibility of IV analysis depends entirely on the quality of the instrument. A strong defense of the exclusion restriction is paramount.
Weak Instruments: If the instrument is only weakly correlated with the endogenous variable (the relevance condition is barely met), the IV estimate can be heavily biased and have a large variance.
- Diagnosis: Always check the first-stage F-statistic. A common rule of thumb is that an F-statistic below 10 is a sign of a weak instrument and a serious concern.

Wooclap

Wooclap Link

Wooclap code: OFZFSD (Tutorial 8)

Questions

Estimating the AJR Model

For this question, we use the dataset by Acemoglu, Johnson, and Robinson (2001). The dataset, colonial_origins, contains the following key variables for a sample of former colonies:¹

logpgp95: Log of GDP per capita in 1995 (the outcome variable, Y).
avexpr: An index of “protection against expropriation risk” averaged from 1985-1995. This measures the quality of institutions (the endogenous variable, X).
logem4: The log of the mortality rate for European settlers in the 19th century (the instrumental variable, Z).
africa: A dummy whether a country is in Africa, a control variable.

Estimating the AJR Model (Cont.)

We run four different regressions to analyze the relationship between institutions and economic development. The code and simplified outputs are provided below.

R
Python
Stata

Code

library(AER)
url <- 'https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/colonial_origins.csv'
colonial_data <- readr::read_csv2(url)
# Model 1: OLS
ols_model <- lm(logpgp95 ~ avexpr + africa, data = colonial_data)

# Model 2: Reduced Form
reduced_form <- lm(logpgp95 ~ logem4 + africa, data = colonial_data)

# Model 3: First Stage
first_stage <- lm(avexpr ~ logem4 + africa, data = colonial_data)

# Model 4: 2SLS / IV
iv_model <- ivreg(logpgp95 ~ avexpr + africa | logem4 + africa, data = colonial_data)
summary(iv_model, diagnostics = TRUE) # To get the F-statistic
## 
## Call:
## ivreg(formula = logpgp95 ~ avexpr + africa | logem4 + africa, 
##     data = colonial_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2989 -0.4779 -0.0132  0.6373  1.4721 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.9877     1.3271   2.251    0.028 *  
## avexpr        0.8023     0.1891   4.243 7.62e-05 ***
## africa       -0.3625     0.2933  -1.236    0.221    
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value   
## Weak instruments   1  61    11.422 0.00127 **
## Wu-Hausman         1  60     9.382 0.00328 **
## Sargan             0  NA        NA      NA   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8114 on 61 degrees of freedom
## Multiple R-Squared: 0.4145,  Adjusted R-squared: 0.3953 
## Wald test: 27.55 on 2 and 61 DF,  p-value: 2.982e-09

Code

import statsmodels.formula.api as smf
from linearmodels.iv import IV2SLS
import pandas as pd

url = 'https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/colonial_origins.csv'

colonial_data = pd.read_csv(url, decimal=',', sep=";")
# Model 1: OLS
ols_model = smf.ols('logpgp95 ~ avexpr + africa', data=colonial_data).fit()

# Model 2: Reduced Form
reduced_form = smf.ols('logpgp95 ~ logem4 + africa', data=colonial_data).fit()

# Model 3: First Stage
first_stage = smf.ols('avexpr ~ logem4 + africa', data=colonial_data).fit()

# Model 4: 2SLS / IV
iv_model = IV2SLS.from_formula('logpgp95 ~ 1 + africa + [avexpr ~ logem4]', data=colonial_data).fit()
print(iv_model)
##                           IV-2SLS Estimation Summary                          
## ==============================================================================
## Dep. Variable:               logpgp95   R-squared:                      0.4145
## Estimator:                    IV-2SLS   Adj. R-squared:                 0.3953
## No. Observations:                  64   F-statistic:                    48.840
## Date:                Wed, Oct 29 2025   P-value (F-stat)                0.0000
## Time:                        12:54:05   Distribution:                  chi2(2)
## Cov. Estimator:                robust                                         
##                                                                               
##                              Parameter Estimates                              
## ==============================================================================
##             Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
## ------------------------------------------------------------------------------
## Intercept      2.9877     1.3963     2.1398     0.0324      0.2511      5.7244
## africa        -0.3625     0.2877    -1.2599     0.2077     -0.9263      0.2014
## avexpr         0.8023     0.1993     4.0262     0.0001      0.4117      1.1928
## ==============================================================================
## 
## Endogenous: avexpr
## Instruments: logem4
## Robust Covariance (Heteroskedastic)
## Debiased: False

print("First Stage F Stat:", iv_model.first_stage.diagnostics['f.stat'])
## First Stage F Stat: avexpr    10.270258
## Name: f.stat, dtype: float64

Code

import delimited https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/colonial_origins.csv, delimiters(";") clear
 
destring logpgp95 avexpr logem4 lat_abst, dpcomma replace
 
* Model 1: OLS
reg logpgp95 avexpr africa
 
* Model 2: Reduced Form
reg logpgp95 logem4 africa
 
* Model 3: First Stage
reg avexpr logem4 africa
 
* Model 4: 2SLS / IV
ivregress 2sls logpgp95 africa (avexpr = logem4), small first
 
* To get the first-stage F-statistic
estat firststage

Verify the IV Estimate: The lecture shows that the IV estimator can be calculated as the ratio of the Reduced Form effect to the First Stage effect. Using the coefficients from the reduced form and first stage, manually calculate this ratio. Does your result match the coefficient for avg_exprop in the 2SLS model?
Compare OLS and 2SLS: The 2SLS estimate for the effect of institutions (0.80) is significantly larger than the OLS estimate (0.42). What does this suggest about the direction of the omitted variable bias in the OLS model? Provide a brief economic explanation for why this might be the case.
Assess Instrument Strength: Based on the results provided in the table, is settler mortality a weak instrument for institutions? Justify your answer using the relevant statistic.

Colonial Origins IV Strategy

Acemoglu, Johnson, and Robinson (2001) want to estimate the causal effect of institutions (\(X\)) on long-run economic development (Y, measured by log GDP per capita). They know that OLS is likely biased. They propose using the mortality rates of early European settlers (\(Z\)) as an instrumental variable for current institutions.

Relevance: Clearly explain the “first stage” logic. By what mechanism did settler mortality rates influence the type of institutions that were established in a colony?
Exclusion Restriction: State the exclusion restriction in the context of this study. What must be true about settler mortality for it to be a valid instrument?
LATE: In this context, who are the “compliers”? What does the LATE estimated by this IV strategy represent?

Defending and Challenging the Exclusion Restriction

The credibility of the Acemoglu et al. (2001) paper rests on the validity of the exclusion restriction.

Provide a strong argument to defend the exclusion restriction. Why is it plausible that 19th-century settler mortality has no direct effect on 21st-century GDP, other than through its effect on the development of institutions?
Provide a plausible counterargument that could violate the exclusion restriction. Propose a specific channel through which early settler mortality might affect current GDP that does not run through political institutions. (For example, think about the disease environment or human capital).

Estimating and Interpreting 2SLS Results

Card (1995) estimates the causal effect of education (educ) on earnings (wage). A simple OLS regression of wages on education is likely biased because unobserved factors, like innate ability or family background, can affect both how much education a person gets and how much they earn. Card’s ingenious solution was to use proximity to a college as an instrument (nearc4).¹

Create a dummy variable, treated, indicating whether educ>12.
Manually compute the Wald estimator using nearc4 as an instrument for treated.
Estimate the Wald Estimator through 2SLS in R/Python/Stata.
Estimate and interpret the first stage.
Interpret the 2SLS estimate.
Based on the first stage and the 2SLS estimate, what would the coefficient on the instrument in the reduced form equation equal? Interpret this coefficient.

Designing an Alternative IV Study

Imagine someone is unconvinced by the settler mortality instrument of AJR (2001). She wants to re-test the Acemoglu et al. (2001) hypothesis using a different IV strategy for the effect of institutions on development.

Propose a new, potential instrumental variable for “quality of institutions”. Justify your choice by carefully explaining:

Why you believe it would be relevant (correlated with institutions).
How you would defend its validity (the exclusion restriction).
What potential weaknesses or criticisms your proposed instrument might face.

Deriving the IV Estimator

Consider the linear model \(Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\), where \(Cov(X_i, \epsilon_i) \neq 0\). You have a valid instrument \(Z_i\) which satisfies the relevance condition, \(Cov(Z_i, X_i) \neq 0\), and the exclusion restriction, \(Cov(Z_i, \epsilon_i) = 0\).

Starting from the model equation, use the exclusion restriction to show that the IV estimator for \(\beta_1\) is given by: \[ \hat{\beta}_1^{\text{IV}} = \frac{Cov(Z, Y)}{Cov(Z, X)} \] (Hint: Start by taking the covariance of the entire model equation with respect to Z).

The Wald Estimator

Imagine a population where individuals are classified based on their response to an encouragement design (an instrument, Z):

Never-Takers (40% of the population): They never get the treatment (X=0), regardless of encouragement (Z). Their average outcome is 30.
Always-Takers (20% of the population): They always get the treatment (X=1), regardless of encouragement. Their average outcome is 50.
Compliers (40% of the population): They get the treatment only if encouraged (X=1 if Z=1, X=0 if Z=0). If they are treated, their average outcome is 60. If they are not treated, their average outcome is 40.

Assume there are no Defiers.

Calculate the four key quantities for the Wald estimator: \(E[Y | Z=1]\), \(E[Y | Z=0]\), \(E[X | Z=1]\), and \(E[X | Z=0]\).
Using these values, compute the treatment effect using the Wald estimator formula: \(\hat{\beta}_{\text{Wald}} = \frac{E[Y | Z=1] - E[Y | Z=0]}{E[X | Z=1] - E[X | Z=0]}\).
What is the Local Average Treatment Effect (LATE) for this population? How does it compare to your result from part (b)?

The Danger of Weak Instruments

The lecture notes state that the approximate bias of the IV estimator is \(Bias(\hat{\beta}_{IV}) \approx \frac{Cov(Z, \epsilon)}{Cov(Z, X)}\).

Using this formula, explain intuitively why a “weak instrument” (an instrument with low relevance) can lead to a very large bias in the IV estimate, even if the exclusion restriction is only violated by a tiny amount (i.e., \(Cov(Z, \epsilon)\) is very close to, but not exactly, zero).
In practice, how do researchers test for weak instruments? What is the common rule of thumb for the relevant test statistic?

Understanding 2SLS

Consider a model to estimate the effect of endogenous police numbers (\(X_i\)) on the crime rate (\(Y_i\)), controlling for the poverty level (\(W_i\)). You have two instruments for police numbers: the number of firefighters in the city (\(Z_{1i}\)) and whether it was a mayoral election year (\(Z_{2i}\)).

The structural model is: \(Y_i = \beta_0 + \beta_1 X_i + \gamma_1 W_i + \epsilon_i\).

Write down the equation for the First Stage regression of the 2SLS procedure. What are the dependent and independent variables?
Write down the equation for the Second Stage regression. What is the key difference between the regressor of interest in this stage versus the structural model?
The lecture notes that you should not perform these two regressions manually because the standard errors from the second stage will be incorrect. Briefly explain why this is the case. (Hint: Think about the uncertainty involved in the first stage).