Empirical Economics

Tutorial 6: Binary Outcomes

Tutorial 6

Recapitulation of the Lecture

Linear Probability Model

Why Special Models for Binary Outcomes?
Many economic questions have a “Yes/No” or 0/1 answer. Our dependent variable, \(y\), can only take two values.
We want to model the probability of an event occurring, \(P(y=1|X)\).
The simplest approach is to run OLS anyway: \[ y_i = \beta_0 + \beta_1 x_{1i} + \dots + \epsilon_i \]
The predicted value, \(\hat{y}_i\), is interpreted as the predicted probability that \(y_i=1\). \[ E[y_i | X_i] = P(y_i=1 | X_i) = \beta_0 + \beta_1 x_{1i} + \dots \]
Interpretation: \(\beta_k\) is the change in the probability that \(y=1\) for a one-unit increase in \(x_k\).

Pros:

Simple Interpretation: Coefficients are direct changes in probability (in percentage points).
Easy Estimation: It’s just OLS.
Works well with Fixed Effects in panel data.

Cons:

Nonsensical Predictions: Nothing prevents predicted probabilities from being < 0 or > 1.
Inherent Heteroskedasticity: The error variance, \(Var(\epsilon_i | X_i) = p_i(1-p_i)\), depends on X, violating a key OLS assumption and biasing standard errors (fixable with robust SEs).

Probit and Logit

To solve the LPM’s issues, we need a model that constrains the predicted probability to the \([0,1]\) interval. This is achieved using a Cumulative Distribution Function (CDF), which naturally produces an S-shaped curve.
Latent variable framework:

\[ P(y_i=1 | X_i) = P(y_i^* > 0) = P(\epsilon_i > -(\beta_0 + \beta_1 x_i)) = F(\beta_0 + \beta_1 x_i) \]

The choice of model depends on the assumed distribution for \(\epsilon_i\):
- Probit Model: Assumes \(\epsilon_i\) follows a Standard Normal distribution.
- \(P(y=1 | X) = \Phi(\beta_0 + \beta_1 x_i)\), where \(\Phi(\cdot)\) is the Normal CDF.
- Logit Model: Assumes \(\epsilon_i\) follows a Standard Logistic distribution.
- \(P(y=1 | X) = \Lambda(\beta_0 + \beta_1 x_i) = \frac{e^{\beta_0 + \beta_1 x_i}}{1 + e^{\beta_0 + \beta_1 x_i}}\).
Estimation: Both are estimated using Maximum Likelihood Estimation (MLE), which finds the \(\beta\) values that make our observed data most probable.

Interpretation & Evaluation

In Probit/Logit, the estimated coefficients (\(\hat{\beta}\)) are NOT marginal effects.
They represent the change in the latent variable \(y^*\) for a one-unit change in an \(x\) variable.
The effect on the probability is non-linear and depends on the values of all X variables. \[ \frac{\partial P(y=1 | X)}{\partial x_k} = f(\beta X) \cdot \beta_k \] where \(f(\cdot)\) is the Probability Density Function (PDF).
Practical Solution: Average Marginal Effects (AMEs)
This is the standard for interpretation.
First, calculate the marginal effect for each observation in the data using its specific X values. Then, take the average of all these individual marginal effects.
Interpretation: The AME of \(x_k\) is the average change in the probability of success for a one-unit increase in \(x_k\) across the sample.

Wooclap

Wooclap Code: OFZFSD (Tutorial 6)

Wooclap Link

Questions

Interpreting Binary Outcome Models

Abadie (2003) is interested in the effect of being eligible for a 401(k) plan on an individual’s decision to save through an Individual Retirement Account (IRA). The idea is to see if eligibility for a workplace retirement plan “crowds out” other forms of retirement savings.¹

Estimates of 401k Eligibility on Saving through an IRA
	LPM	Logit	Probit
* p < 0.1, p < 0.05, * p < 0.01
401(k) Eligibility	0.057***	0.355***	0.206***
	(0.010)	(0.058)	(0.034)
Income (000s)	0.006***	0.033***	0.019***
	(0.000)	(0.001)	(0.001)
Married	-0.017*	-0.072	-0.042
	(0.009)	(0.065)	(0.037)
Male	0.007	0.035	0.021
	(0.011)	(0.074)	(0.042)
Age	0.009***	0.054***	0.031***
	(0.000)	(0.003)	(0.002)
Num.Obs.	9275	9275	9275
R2	0.177	0.156	0.157

Interpreting Binary Outcome Models

Compare the estimated coefficients on p401k across the three models. Do they agree on the direction and statistical significance of the effect of 401(k) eligibility on IRA participation?
Compare the estimated coefficients on inc across the three models. Do they agree on the direction and statistical significance of the effect of income on IRA participation?
Interpet the coefficient on p401k from the LPM. What does this coefficient tell you about the effect of 401(k) eligibility on the probability of IRA participation?
Are the R-squared values comparable across the three models? Why or why not?

Model Estimation

For this question, you will use the Bertrand and Mullainathan (2004) dataset, lakisha_aer.dta, which was used for the examples in the lecture.¹

You are interested in the effect of perceived race on the probability of receiving a callback for a job interview. The key variables are call (1 if callback, 0 otherwise) and race (‘b’ for black-sounding name, ‘w’ for white-sounding name).

Estimate a Linear Probability Model (LPM) regressing call on race.
Estimate a Logit model with the same variables.
Estimate a Probit model with the same variables.

For each model, report the estimated coefficient on the race variable and its p-value. Do the models agree on the direction and statistical significance of the effect?

Marginal Effects vs. Coefficients

The coefficients from Logit and Probit models are not directly interpretable as marginal effects.

Calculate the Average Marginal Effect (AME) of the race variable from your Logit and Probit models in the previous question.
Interpret the AME from the Probit model. What does this number tell you about the difference in callback probabilities between resumes with white-sounding and black-sounding names?
How do the AMEs from the Logit and Probit models compare to each other? How do they compare to the coefficient on race from the LPM in Question 4? Discuss your findings.

The Pros and Cons of Simplicity

The lecture introduced the Linear Probability Model (LPM) as a straightforward way to handle binary outcomes using OLS.

What are the two main advantages of the LPM, particularly concerning estimation and interpretation?
What are its two primary shortcomings, as discussed in the lecture? Explain why each is a problem.
One of these shortcomings can be partially addressed using robust standard errors. Which one is it, and why does this fix work?

The Latent Variable Framework

Both Probit and Logit models are motivated by an underlying latent variable, \(y^*\).

In your own words, explain the concept of a latent variable \(y^*\) and how it relates to the observed binary outcome \(y\).
The lecture states that \(P(y_i=1) = P(\epsilon_i > -X_i'\beta)\). What is the final step needed to get from this expression to the specific functional forms for the Probit and Logit models? What key assumption distinguishes the two models?

Understanding Maximum Likelihood

Probit and Logit models are estimated using Maximum Likelihood Estimation (MLE), not OLS.

Explain the fundamental goal of MLE. How does its objective differ from the objective of OLS (which minimizes the sum of squared residuals)?
Using the “biased coin” example from the lecture (observing 7 heads in 10 flips), explain how you would construct the likelihood function. What value of p (the probability of heads) does MLE tell us is the best estimate, and why is this intuitive?

The Complexity of Interaction Terms

Suppose you were to estimate a Probit model:

\[ P(\text{call}=1 | X) = \Phi(\beta_0 + \beta_1 \text{race}_i + \beta_2 \text{female}_i + \beta_3 (\text{race}_i \times \text{female}_i)) \]

Explain why you cannot simply look at the sign and significance of \(\hat{\beta}_3\) to determine the sign and significance of the interaction effect on the probability of receiving a callback. How does this differ fundamentally from the interpretation of an interaction term in an LPM?

Censoring and the Tobit Model

A health economist is studying individual annual out-of-pocket spending on prescription medication. A significant fraction of the sample (30%) has zero expenditure because they did not purchase any medication.

The researcher considers dropping the zero-expenditure observations and running an OLS regression on the remaining positive values. Why would this lead to biased estimates?
The researcher then considers running OLS on the full sample, including the zeros. Why is this also problematic for estimating the relationship between income and medication spending?
Explain why the Tobit model is designed to handle this specific type of data problem.

Estimating a Tobit Model

This question uses the Mroz dataset, which contains data on the labor force participation of married women in 1975.¹

The primary variable of interest is hours, representing the number of hours the woman worked in 1975. A significant number of women in the sample did not work, resulting in hours = 0. We want to model the factors that determine hours worked.

Load the Mroz dataset. Create a histogram or frequency table for the hours variable. Based on the distribution you observe, explain precisely why a Tobit model is a more appropriate choice than a standard OLS regression for this research question.
Estimate a Tobit model where hours is the dependent variable. Use the following as independent variables: educ (years of education), exper (actual labor market experience), nwifeinc (non-wife household income, in thousands), and kidslt6(number of children less than 6 years old). Report the estimated coefficients and their statistical significance.
Focus on the estimated coefficient for kidslt6. Provide a careful interpretation of this coefficient. What does this coefficient tell you about the relationship between having young children and the latent propensity or desired hours of work?
Now estimate an OLS model. Compare your results with the Tobit model. Is this a valid comparison?

Empirical Economics

Tutorial 6

Recapitulation of the Lecture

Linear Probability Model

Probit and Logit

Interpretation & Evaluation

Wooclap

Wooclap

Questions

Interpreting Binary Outcome Models

Interpreting Binary Outcome Models

Model Estimation

Marginal Effects vs. Coefficients

The Pros and Cons of Simplicity

The Latent Variable Framework

Understanding Maximum Likelihood

The Complexity of Interaction Terms

Censoring and the Tobit Model

Estimating a Tobit Model

The End