Tutorial 7: Difference-in-Differences
We would really like you to take the time to fill out the course evaluation form. Your feedback is very important to us and helps us improve the course for future students.
You can fill it in on https://entry.caracal.uu.nl/44007 under “Course Evaluations” or scan the QR code:
A core challenge in economics is to isolate causal effects from mere correlations.
The Potential Outcomes Framework To formalize causality, we use Potential Outcomes. For any individual \(i\) and a treatment (e.g., a job training program), we define two potential outcomes:
For any individual, we can only ever observe one of these potential outcomes. We see \(Y_i(1)\) if they get treated or \(Y_i(0)\) if they don’t, but never both.
A simple comparison of average outcomes between treated and untreated groups, \(E[Y_i | T_i=1] - E[Y_i | T_i=0]\), is misleading.
This is because it combines the true treatment effect with pre-existing differences between the groups.
This can be formally shown as:
\[ \begin{align*} E[Y_i|T_i=1] - E[Y_i|T_i=0] = &\underbrace{E[Y_i(1) - Y_i(0) | T_i=1]}_{\text{ATT}} +\\ &\underbrace{E[Y_i(0)|T_i=1] - E[Y_i(0)|T_i=0]}_{\text{Selection Bias}} \end{align*} \]
Difference-in-Differences (DiD) is a powerful method to eliminate selection bias by using data from two groups (Treatment and Control) across two time periods (Pre and Post).
The logic is to:
\[ \tau_{DiD} = \Delta_T - \Delta_C = (\hat{Y}_{T,Post} - \hat{Y}_{T, Pre}) - (\hat{Y}_{C, Post} - \hat{Y}_{C, Pre}) \]
The DiD estimate can be obtained by estimating the coefficient \(\beta_3\) in the following OLS regression:
\(Y_{it} = \beta_0 + \beta_1 D_i + \beta_2 Post_t + \beta_3(D_i \times Post_t) + u_{it}\)
This framework is advantageous as it provides standard errors for inference and allows for the inclusion of control variables (\(X_{it}\)) to strengthen the parallel trends assumption.
When data is available for multiple time periods, the DiD model can be extended into an event study to analyze dynamic effects over time. The typical model is:
\(y_{it} = \alpha_i + \lambda_t + \sum_{k=-K}^{L} \delta_k D_{it}^k + u_{it}\)
Wooclap Code: OFZFSD
This question uses the classic nlswork.dta dataset, which contains panel data on young working women. We want to estimate the causal effect of joining a union on wages.1
For this exercise, define the “treatment group” as women who were non-union in year 77 but became union members by year 78. The “control group” will be women who were non-union in both year 77 and year 78. Create the relevant dummy variables (Treat and Post) for this 2x2 setup (Pre = 77, Post = 78).
Manually calculate the four means for the 2x2 DiD table (\(\hat{Y}_{T, Pre}\), \(\hat{Y}_{T, Post}\), \(\hat{Y}_{C, Pre}\), \(\hat{Y}_{C, Post}\)) and compute the DiD estimate.
Estimate the DiD effect by running the regression \(Y_{it} = \beta_0 + \beta_1 Treat_i + \beta_2 Post_t + \beta_3(Treat_i \times Post_t) + \epsilon_{it}\). Confirm that the coefficient \(\hat{\beta}_3\) matches your manual calculation. Report and interpret the result.
Duflo (2001) analyses a major program to build over 61,000 new primary schools across Indonesia. The number of schools built in different regions depended on the number of school-aged children in that region in 1972. This created a “natural experiment”: some regions got a lot of new schools, while others got very few. The study asks a simple question: Did children in regions that got more schools end up with more education and higher wages?1
One can conduct a difference-in-differences analysis comparing regions with high school construction to regions with low school construction, for cohorts that went to school before and after the program started in 1989.
Import the data into your statistical software of choice.
Plot the average educational attainment (yeduc) for the high-construction and low-construction regions for the pre-treatment cohorts (-4 until -1). Does this visual evidence support the parallel trends assumption for a simple DiD analysis around 1989? Discuss what you see in the cohorts leading up to the treatment.
Briefly explain what parallel trends means in this context and outline how you would formally test for pre-trends using a regression framework. What coefficients would you be looking at, and what would you hope to find?
Conduct this test. You must normalize \(\beta_{-1}=0\). Report and interpret your findings.
The lecture shows that the simple difference-in-means estimator, \(E[Y|T=1] - E[Y|T=0]\), is a biased estimator for the Average Treatment Effect on the Treated (ATT).
Starting from the identity \(E[Y|T=1] - E[Y|T=0] = E[Y(1)|T=1] - E[Y(0)|T=0]\), show the full derivation that decomposes this difference into the ATT and the Selection Bias term. Explain each step of your derivation.
Provide an intuitive explanation of the selection bias term, \(E[Y(0)|T=1] - E[Y(0)|T=0]\), using the lecture’s example of a job training program. What would a positive selection bias imply in this context?
A researcher evaluates the impact of a new subway line on local house prices. She collects data from a neighborhood that received the subway line (Treatment) and a similar neighborhood that did not (Control), both before and after the line opened. The average house prices (in thousands of dollars) are as follows:
| Before Period (Pre) | After Period (Post) | |
|---|---|---|
| Treatment Group | 450 | 520 |
| Control Group | 420 | 460 |
Calculate the Difference-in-Differences estimate (\(\tau_{DiD}\)) for the effect of the subway line on house prices. Show your calculations step-by-step.
What is the “secular trend” in house prices according to this data?
Based on your result, what is your conclusion about the causal effect of the new subway line?
Consider a study on the effect of a new irrigation system (the treatment) on the crop yield of farms. Let \(Y_i\) be the crop yield for farm \(i\).
Define the potential outcomes, \(Y_i(1)\) and \(Y_i(0)\), in the context of this specific example.
What is the individual causal effect (\(\tau_i\)) for a single farm \(i\)?
Explain the “fundamental problem of causal inference” using this irrigation example. Why can’t we directly calculate \(\tau_i\) for any farm?
The validity of the DiD estimator hinges on the parallel trends assumption.
Write down the parallel trends assumption mathematically, using the potential outcomes notation provided in the lecture.
Explain in plain English what this assumption means. Why is it essential for the DiD strategy to identify the ATT? What unobservable counterfactual does it allow us to estimate?
If you had access to data from several years before the treatment was implemented, how could you visually build evidence for the credibility of this assumption?
A researcher estimates the effect of a province-level environmental policy on air quality using the following regression model, where AirQuality is an index, Treat is a dummy for provinces that adopted the policy, and Post is a dummy for the period after the policy was adopted.
\(Y_{it} = \beta_0 + \beta_1 Treat_i + \beta_2 Post_t + \beta_3(Treat_i \times Post_t) + \epsilon_{it}\)
The estimated model is:
\(\widehat{\text{AirQuality}}_{it} = 75.2 + 5.5 \cdot Treat_i - 8.1 \cdot Post_t - 4.3 \cdot (Treat_i \times Post_t)\)
What is the average air quality for the control group in the pre-treatment period?
What does the coefficient \(\beta_1 = 5.5\) represent? Does this indicate the policy was assigned randomly?
What does the coefficient \(\beta_2 = -8.1\) represent?
What is the DiD estimate of the policy’s effect? Interpret this coefficient (\(\beta_3 = -4.3\)) in the context of the study.
Imagine you are designing a study to evaluate the impact of a scholarship program (awarded to students in City A) on university enrollment rates. City B, which has no such program, serves as your control.
(a) Policy Anticipation: How might anticipation effects violate the parallel trends assumption? Give a concrete example related to this scholarship program.
(b) Spillovers: How might spillover effects contaminate your control group? Give a concrete example.
(c) Ashenfelter’s Dip: Describe what Ashenfelter’s dip would look like in this context and why it would lead to a biased estimate of the program’s effect.
A researcher evaluates the effect of a new fertilizer on crop yields. They use a DiD design comparing farms that adopted the fertilizer (treatment) to those that did not (control).
After presenting their results, a skeptical audience member points to their pre-treatment trend graph. The graph clearly shows that in the years leading up to the adoption, the yields of the “treatment” farms were already declining, while the yields of the “control” farms were stable.
(a) Does this observation support or violate the parallel trends assumption?
(b) Assuming the researcher found a positive DiD estimate (i.e., the fertilizer appeared to increase yields), is this estimate likely an overestimate or an underestimate of the true causal effect? Explain your reasoning by describing the counterfactual trend for the treatment group.
Empirical Economics: Tutorial - DiD