Empirical Economics

Tutorial 4: Panel Data I

Tutorial 4

Recapitulation of the Lecture

Panel Data

Data that follows the same individuals (e.g., firms, countries, people) over multiple time periods.
It combines a cross-sectional dimension (\(i=1, ..., N\)) and a time-series dimension (\(t=1, ..., T\)).
The primary reason to use panel data is to control for unobserved heterogeneity—time-invariant factors that are difficult or impossible to measure (e.g., a firm’s management quality, an individual’s innate ability).
These unobserved factors are captured in an individual-specific effect, \(\alpha_i\).
By accounting for \(\alpha_i\), we can get a better estimate of the causal effect of our variables of interest.
Between Variation: Variation across individuals at a given point in time.
Within Variation: Variation over time for a single individual.
Panel methods like Fixed Effects primarily use within variation to estimate coefficients.

The FE Model

The Fixed Effects model explicitly includes an individual-specific intercept, \(\alpha_i\), for each entity: \(y_{it} = \alpha_i + \beta_1 X_{it} + u_{it}\)
Key Idea: It allows the unobserved effect \(\alpha_i\) to be correlated with the explanatory variables \(X_{it}\). This is crucial for controlling for selection bias.
Two equivalent methods eliminate \(\alpha_i\) to estimate \(\beta\):
1. Within Estimator (De-meaning):
- Subtract the time-mean from each variable for each individual.
- \((y_{it} - \bar{y}_i) = \beta_1(X_{it} - \bar{X}_i) + (u_{it} - \bar{u}_i)\)
- Or: \(\ddot{y}_{it} = \beta_1 \ddot{X}_{it} + \ddot{u}_{it}\)
- Run OLS on the “de-meaned” data.
1. Least Squares Dummy Variable (LSDV):
- Include a dummy variable for each individual (minus one) in a pooled OLS regression.
- \(y_{it} = \beta_1 X_{it} + \delta_1 D_{1i} + \delta_2 D_{2i} + ... + \delta_{N-1} D_{N-1,i} + u_{it}\)
The coefficient \(\beta\) measures the average change in \(y\) for a one-unit change in \(X\) within an individual over time.
Limitation: Cannot estimate the effect of time-invariant variables (e.g., gender, race), as they are absorbed by \(\alpha_i\).

The FD Model

The First Differences (FD) model is an alternative way to eliminate the fixed effect \(\alpha_i\).
We subtract the previous period’s equation from the current period’s equation: \((y_{it} - y_{i,t-1}) = \beta(X_{it} - X_{i,t-1}) + (u_{it} - u_{i,t-1})\)
This simplifies to: \(\Delta y_{it} = \beta \Delta X_{it} + \Delta u_{it}\)
An OLS regression on these differenced variables yields the FD estimate of \(\beta\).
Key Features:
Like the FE model, FD allows the unobserved effect \(\alpha_i\) to be correlated with the regressors.
It provides a consistent estimate of \(\beta\) by removing the time-invariant heterogeneity.

FE vs. FD

For both FE and FD estimators to be unbiased, we need the strict exogeneity assumption.
The idiosyncratic error \(u_{it}\) must be uncorrelated with the explanatory variables in all time periods (past, present, and future) for each individual. \(E(u_{it} | X_{i1}, X_{i2}, ..., X_{iT}) = 0\)
This is a stronger assumption than contemporaneous exogeneity and rules out models with lagged dependent variables or feedback effects.
The choice depends on the properties of the error term \(u_{it}\):
If \(u_{it}\) is serially uncorrelated (i.i.d.):
The Fixed Effects (FE) estimator is more efficient (has smaller standard errors).
In this case, the differenced error \(\Delta u_{it}\) will have an autocorrelation of -0.5.
If \(u_{it}\) follows a random walk (i.e., \(u_{it} = u_{i,t-1} + e_{it}\) where \(e_{it}\) is i.i.d.):
The First Differences (FD) estimator is more efficient.
Practical Rule: If FE and FD estimates are very different, it may signal a violation of the strict exogeneity assumption or model misspecification.

Wooclap

Wooclap Link

Wooclap Code: OFZFSD (Tutorial 4)

Questions

Estimating an FE Model

The JTRAIN dataset contains panel data for 157 manufacturing firms in Michigan from 1987 to 1989. The state of Michigan awarded grants to some firms to train their workers. We want to investigate whether receiving a training grant (grant) reduces a firm’s scrap rate (scrap), which is a measure of worker productivity (a lower scrap rate is better).¹ A key concern is that firms with unobserved, time-invariant characteristics (like better management or a more motivated workforce) might be both more likely to receive a grant and have lower scrap rates to begin with.

Write R/Python/Stata code to estimate a firm-level fixed effects model where you regress the scrap rate (scrap) on the grant indicator (grant). Include the code and the key output (the coefficient table) in your answer.
Interpret the estimated coefficient on the grant variable. What does its sign and magnitude tell you about the effectiveness of the training grants? Be specific about the “within-firm” nature of this interpretation.
After running the model, you could (in principle) retrieve the estimated fixed effect, \(\hat{\alpha}_i\), for each firm. For example, suppose the estimated fixed effect for “Firm A” is -2.5 and for “Firm B” is +1.8. What do these fixed effects conceptually represent? What does it mean that Firm A’s fixed effect is lower than Firm B’s?

Estimating a FD Model

In the previous exercise, you estimated a Fixed Effects (FE) model. Now, we will use an alternative method, the First Differences (FD) model, to address the same problem of unobserved, time-invariant firm characteristics.

Write the R/Python/Stata code to estimate a firm-level First Differences (FD) model where you regress the change in the scrap rate (\(\Delta\text{scrap}\)) on the change in the grant indicator (\(\Delta\text{grant}\)). Include the code and the key output (the coefficient table) in your answer.
Interpret the estimated coefficient on the grant variable from your FD model. How does this interpretation differ slightly from the FE interpretation? Be specific about what the change in the grant variable represents.
The Fixed Effects (FE) model from the previous question yielded an estimated coefficient for grant of -0.75. Your FD model will likely produce a different estimate.

Compare the sign, magnitude, and statistical significance of your FD estimate to the FE estimate.
The lecture notes state that if the strict exogeneity assumption holds, the FE and FD estimates should be similar. Based on your comparison, do you find evidence to support this assumption?
Under what theoretical conditions regarding the error term (\(u_{it}\)) would the FE estimator be considered more efficient (and therefore preferable) than the FD estimator?

Visualizing Unobserved Heterogeneity

The “fixed effects” (\(\alpha_i\)) estimated in an FE model represent the time-invariant, unobserved characteristics of each individual (e.g., innate ability, motivation, family background) that affect their wage. Visualizing the distribution of these effects can reveal the extent of this heterogeneity.¹

Estimate a Fixed Effects model for lwage using exper, expersq, and union as predictors.
Extract the individual fixed effects from your estimated model.
Create a histogram or density plot to visualize the distribution of these fixed effects.
Interpret the plot: What does the shape and spread of the distribution tell you about the sample? Does it appear that unobserved, time-constant individual differences are an important factor in explaining wage variation?

Efficiency of FD vs. FE Estimators

The lecture notes that the First Difference (FD) estimator can be more efficient than the Fixed Effects (FE) estimator if the idiosyncratic error term, \(\epsilon_{it}\), follows a random walk.

Let the error term be defined as \(\epsilon_{it} = \epsilon_{i,t-1} + \nu_{it}\), where \(\nu_{it}\) is a white noise error.

Show mathematically how the error term in the transformed FD model (\(\Delta \epsilon_{it}\)) becomes non-serially correlated, while the error term in the transformed FE model (\(\epsilon_{it} - \bar{\epsilon}_i\)) remains serially correlated.

The Mechanics of Transformation

Consider the following small panel dataset for two firms (\(i=A, B\)) over three years (\(t=1, 2, 3\)). The model is \(Y_{it} = \alpha_i + \beta X_{it} + u_{it}\).

Firm (i)	Year (t)	Y	X
A	1	5	2
A	2	7	3
A	3	6	4
B	1	12	6
B	2	10	5
B	3	14	7

Calculate the firm-specific means (\(\bar{y}_A, \bar{x}_A, \bar{y}_B, \bar{x}_B\)). Then, construct the “de-meaned” variables \(\ddot{y}_{it} = y_{it} - \bar{y}_i\) and \(\ddot{x}_{it} = x_{it} - \bar{x}_i\) for all six observations.
Construct the first-differenced variables \(\Delta y_{it} = y_{it} - y_{i,t-1}\) and \(\Delta x_{it} = x_{it} - x_{i,t-1}\). How many observations do you have in your differenced dataset? Also suppose \(\beta=2\). Calculate the residuals for both models.
Suppose that a firm’s investment in the current year (\(X_{it}\)) is partly determined by an unexpectedly high profit shock from the previous year (which is part of \(u_{i,t-1}\)). Explain why this “feedback effect” violates the strict exogeneity assumption required for both the FE and FD estimators to be unbiased.

Choosing Between FE and FD

You are estimating the impact of a country’s R&D spending (\(X_{it}\)) on its GDP growth (\(y_{it}\)) using a panel dataset. They are concerned about unobserved, time-invariant country characteristics like “innovation culture” (\(\alpha_i\)). They estimate the model using both the Fixed Effects (FE) and First Differences (FD) estimators.

The FE estimate for the effect of R&D is 0.15, and the FD estimate is 0.18. Both are statistically significant. According to the lecture, what does the similarity of these results suggest about the model and its underlying assumptions?
To decide which estimator is more efficient, the researcher follows the procedure from the lecture. They run the FD regression and test for serial correlation in the residuals, \(\Delta \hat{u}_{it}\). They find that the correlation between \(\Delta \hat{u}_{it}\) and \(\Delta \hat{u}_{i,t-1}\) is -0.48. Based on this result, which model (FE or FD) is preferable, and why? Explain the theory behind your choice.
Now, suppose a different researcher ran a similar analysis and found the correlation between \(\Delta \hat{u}_{it}\) and \(\Delta \hat{u}_{i,t-1}\) was -0.05. In this case, which model would be preferable, and why? What does this result imply about the time-series properties of the original error term \(u_{it}\)?