
Lecture 1: The Linear Model I
Empirical Economics
Two central aspects:
Central course objective: to make you understand enough econometric theory, and have you obtain enough experience to :
You can submit a request for examination accommodations such as exam time extension via OSIRIS Student tab Cases.
Do this by Friday September 12 at the latest. We can then arrange everything before your first exams.
Do you already have provisions from the UU? Then you do not need to submit a request via OSIRIS.
First two lectures devoted to the linear model.
Prequisite knowledge:
This lecture and remaining lectures:
Material: Wooldridge Chapters 1 and 2
Two Distinct Goals: Prediction and Causation
In econometrics, we build models with two primary objectives in mind:
Example: Prediction
Predicting next quarter’s GDP growth using indicators like inflation, consumer confidence, and employment data. The main goal is the accuracy of the GDP forecast (ŷ), not necessarily the isolated impact of each indicator.
The goal of causal inference is to determine the specific impact of one variable on another, holding all other relevant factors constant.
Our focus is on \(\hat{\beta}\): This is the estimated coefficient of an independent variable. It quantifies the direction and magnitude of the relationship between an independent variable and the dependent variable.
Example: Causation
Estimating the effect of an additional year of education (\(X\)) on an individual’s wages (\(Y\)). We are interested in the specific value of \(\beta\) for education, which would tell us the expected increase in wages for one more year of schooling, assuming other factors are held constant.
Example: A Causal Effect
Banerjee and Duflo (2015) examined the causal effect of a comprehensive anti-poverty program.
Can a “big push” program, which provides a combination of a productive asset, training, and support, have a lasting causal impact on the lives of the ultra-poor?
To answer this, the researchers used a Randomized Controlled Trial (RCT) across six countries.
The study found that, even years after the program ended, the treatment group had significantly higher consumption levels and increased income and assets.
Because of the RCT design, the researchers could confidently conclude that the program caused these improvements.
Example: Correlation without Causation
A significant amount of modern finance research focuses on the relationship between a company’s Environmental, Social, and Governance (ESG) scores and its financial performance.
Studies have documented a positive correlation between high ESG scores and strong financial performance.
Companies that score well on environmental and social metrics also tend to be more profitable.
It does not necessarily mean that high ESG scores cause better financial performance. The relationship could be driven by other factors:
Econometrics is the use of statistical methods to:
It’s where economic theory meets real-world data.
Theory proposes relationships (e.g., Law of Demand), but econometrics tells us the magnitude and statistical significance of these relationships.
It allows you to quantify the relationships that you learn about in your other economics courses.
Examples of Economic Data
Cross-sectional data: A survey of 500 individuals in 2023, with data on their wage, education, gender, and age.
Time-series data: Data on Dutch GDP, inflation, and unemployment from 1950 to 2023.
Pooled cross-sections: A random survey of households in 1990, and another different random survey of households in 2020.
Panel data: Tracking the wage, education, and city of residence for the same 500 individuals every year from 2010 to 2020.
\[ E(y | x) = \beta_0 + \beta_1 x \]
Of course, not everyone with the same level of education has the same wage. Other factors matter (experience, innate ability, location, luck, etc.).
We capture all these other unobserved factors in a stochastic error term, \(u\).
Our individual-level population model is:
\[ y_i = \beta_0 + \beta_1 x_i + u_i \]
We can’t observe the entire population. We only have a sample of data.
Our goal is to use the sample data to estimate the unknown population parameters \(\beta_0\) and \(\beta_1\).
The Sample Regression Function (SRF) is our estimate of the PRF:
\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x \]
Example: Sample Data and Regression

Definition: Residual
We define the residual, \(\hat{u}_i\), as the difference between the actual value \(y_i\) and the fitted value \(\hat{y}_i\): \[ \hat{u}_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i) \]
Definition: OLS Optimzation Problem
\[ \min_{\hat{\beta}_0, \hat{\beta}_1} SSR = \sum_{i=1}^{n} \hat{u}_i^2 = \sum_{i=1}^{n} (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 \]
OLS First Order Conditions
Theorem: OLS Estimates for \(\beta_0\) and \(\beta_1\)
\[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} = \frac{\text{Sample Covariance}(x,y)}{\text{Sample Variance}(x)} \]
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]
where \(\bar{x}\) and \(\bar{y}\) are the sample means of \(x\) and \(y\).
The OLS estimators have some important algebraic properties that come directly from the FOCs:
The sum of the OLS residuals is zero:
\[\sum_{i=1}^{n} \hat{u}_i = 0\]
\[\sum_{i=1}^{n} x_i \hat{u}_i = 0\]
slr_model <- lm(wage ~ educ, data = dat)
# The coefficients are:
summary(slr_model)
##
## Call:
## lm(formula = wage ~ educ, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.7201 -2.3878 -0.3926 1.9554 11.6092
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7863 2.5164 0.710 0.479
## educ 1.1498 0.1887 6.092 2.19e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.4 on 98 degrees of freedom
## Multiple R-squared: 0.2747, Adjusted R-squared: 0.2673
## F-statistic: 37.11 on 1 and 98 DF, p-value: 2.192e-08import statsmodels.api as sm
X = r.dat.educ
y = r.dat.wage
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())
## OLS Regression Results
## ==============================================================================
## Dep. Variable: wage R-squared: 0.275
## Model: OLS Adj. R-squared: 0.267
## Method: Least Squares F-statistic: 37.11
## Date: Wed, 29 Oct 2025 Prob (F-statistic): 2.19e-08
## Time: 12:56:43 Log-Likelihood: -263.27
## No. Observations: 100 AIC: 530.5
## Df Residuals: 98 BIC: 535.8
## Df Model: 1
## Covariance Type: nonrobust
## ==============================================================================
## coef std err t P>|t| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 1.7863 2.516 0.710 0.479 -3.207 6.780
## educ 1.1498 0.189 6.092 0.000 0.775 1.524
## ==============================================================================
## Omnibus: 8.366 Durbin-Watson: 2.235
## Prob(Omnibus): 0.015 Jarque-Bera (JB): 8.006
## Skew: 0.626 Prob(JB): 0.0183
## Kurtosis: 3.595 Cond. No. 99.2
## ==============================================================================
##
## Notes:
## [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.The values of the coefficients depend on the units of measurement of \(y\) and \(x\). We’ve used a level-level model (\(y\) and \(x\) are in their natural units).
Suppose we measured wage in cents instead of euros.
Example: Education in Months
From our definition, \(Educ_{years} = \frac{1}{12} Educ_{months}\). Let’s substitute this into the original estimated equation:
\[\begin{align*} \widehat{Wage} &= \hat{\beta}_0 + \hat{\beta}_1 Educ_{years} \\ &= \hat{\beta}_0 + \hat{\beta}_1 \left( \frac{1}{12} Educ_{months} \right) \\ &= \hat{\beta}_0 + \left( \frac{\hat{\beta}_1}{12} \right) Educ_{months} \end{align*}\]
Here, we transform the dependent variable \(y\): \(\log(y) = \beta_0 + \beta_1 x + u\)
Interpretation of \(\beta_1\): A one-unit increase in \(x\) is associated with a \((100 \times \beta_1)\%\) change in \(y\).
Interpretation of \(\beta\) in the Log-Level Model
To see this, take the derivative of the equation with respect to \(x\): \[ \frac{d(\log(y))}{dx} = \beta_1 \]
Recall the calculus rule/approximation: for small changes, \(\Delta \log(y) \approx \frac{\Delta y}{y}\).
For a one-unit change in \(x\) (\(\Delta x = 1\)): \[ \beta_1 = \frac{\Delta \log(y)}{\Delta x} \approx \frac{\Delta y / y}{1} \]
Example: Log-Level Interpretation
In a log-level model, \(\beta_1\) is the proportional change in \(y\).
We multiply by 100 to get a percentage.
If \(\widehat{\log(Wage)} = 1.5 + 0.08 \times Educ\), an additional year of education is associated with an approximate \(0.08 \times 100 = 8\%\) increase in wage.
Here, we transform the independent variable \(x\): \(y = \beta_0 + \beta_1 \log(x) + u\)
Interpretation of \(\beta_1\): A 1% increase in \(x\) is associated with a \((\beta_1 / 100)\) unit change in \(y\).
Interpretation of \(\beta\) in Level-Log Model
To see this, take the derivative of the equation with respect to \(\log(x)\): \[ \frac{dy}{d(\log(x))} = \beta_1 \]
A change in \(\log(x)\) is approximately the proportional change in \(x\): \(\Delta \log(x) \approx \frac{\Delta x}{x}\).
Example: Level-Log Model
Suppose that the estimated regression model linking advertising expenditure to monthly sales revenue is:
\[ \text{Monthly Sales Revenue} = 50 + 12 \times \log(\text{Monthly Advertising Spending}) \]
In this model, Monthly Sales Revenue (Y) is measured in thousands of euros (Level), and Monthly Advertising Spend (X) is measured in euros (Log).
A 1% increase in Monthly Advertising Spend is associated with a \(12/100 = 0.12\) increase in Monthly Sales Revenue. Since Sales Revenue is measured in thousands of euros, a 1% increase in advertising spend is associated with a €120 increase in monthly sales revenue (\(0.12 \times €1,000 = €120\)).
Interpretation of \(\beta\) in the Log-Log Model
A 1% increase in \(x\) is associated with a \(\beta_1\%\) change in \(y\). To see this, from the model, we can write: \[ \beta_1 = \frac{d(\log(y))}{d(\log(x))} \]
Using the same approximations as before: \[ \beta_1 \approx \frac{\Delta y / y}{\Delta x / x} = \frac{\%\Delta y}{\%\Delta x} \]
Example: Interpretation of \(\beta_1\) in the Log-Log Model
Suppose we have estimated \(\log(\text{Sales}) = 4.8 - 1.2 \times \log(\text{Price})\) for a product. Then, a 1% increase in price is associated with a 1.2% decrease in sales.
Interpretation of the Quadratic Model
The effect of a change in \(x\) on \(y\) now depends on the level of \(x\).
The marginal effect of \(x\) on \(y\) is the derivative with respect to \(x\): \[ \frac{\Delta y}{\Delta x} \approx \frac{dy}{dx} = \beta_1 + 2 \beta_2 x \]
A one-unit change in \(x\) is associated with a change in \(y\) of approximately \(\beta_1 + 2 \beta_2 x\).
Example: Polynomial Regression
Suppose we have estimated a model for annual income based on a worker’s age:
\(\widehat{Income} = 20,000 + 1,500 \times Age - 20 \times Age^2\).
The effect of the first year of work experience (e.g., going from age 20 to 21) is approximately: \(1,500 + 2(-20)(20) = 1,500 - 800 = \$700\).
The effect of gaining one more year of experience when a worker is 40 (i.e., going from age 40 to 41) is: \(1,500 + 2(-20)(40) = 1,500 - 1,600 = -\$100\).
This captures the common life-cycle pattern of earnings: income rises with age and experience, but at a decreasing rate, and may eventually begin to decline after a certain point. This demonstrates the diminishing returns to age and experience on income.
Example: Wage Conditional on Gender Dummy
Consider a regression of wage on a single dummy variable for gender:
\[ Wage_i = \beta_0 + \beta_1 Female_i + u_i \]
where \(Female_i = 1\) if person i is female, and \(Female_i = 0\) if male.
wage for each group:Example: Wage on Gender (Cont.)
The coefficient \(\beta_1\) represents the difference in the expected outcome between the two groups:
\[ E[Wage_i | Female_i=1] - E[wage_i | Female_i=0] = (\beta_0 + \beta_1) - \beta_0 = \beta_1 \]
Now, \(\beta_1\) is the average difference in wages between females and males.
| Model Name | Equation | Interpretation of \(\hat{\beta}_1\) |
|---|---|---|
| Level-Level | \(y = \beta_0 + \beta_1 x\) | A 1-unit change in \(x\) leads to a \(\hat{\beta}_1\) unit change in \(y\). |
| Log-Level | \(\log(y) = \beta_0 + \beta_1 x\) | A 1-unit change in \(x\) leads to a \((100 \times \hat{\beta}_1)\%\) change in \(y\).1 |
| Level-Log | \(y = \beta_0 + \beta_1 \log(x)\) | A 1% change in \(x\) leads to a \((\hat{\beta}_1/100)\) unit change in \(y\). |
| Log-Log | \(\log(y) = \beta_0 + \beta_1 \log(x)\) | A 1% change in \(x\) leads to a \(\hat{\beta}_1\%\) change in \(y\). |
| Polynomial | \(y = \beta_0 + \beta_1 x + \beta_2 x^2 + \dots\) | A 1-unit change in \(x\) leads to a \(\hat{\beta}_1 + 2 \hat{\beta}_2 x\) change in \(y\). |
| Dummies | \(y = \beta_0 + \beta_1 D_i\) | Relative to the reference category, the 1 category has a \(\hat{\beta}_1\) lower/higher \(y\). |
How well does our estimated line explain the variation in our dependent variable, \(y\)?
We can partition the total variation in \(y\) into two parts: the part explained by the model, and the part that is not explained.
Partition of Variation in \(Y\)
SST (Total Sum of Squares): Total variation in \(y\). \(SST = \sum (y_i - \bar{y})^2\)
SSE (Explained Sum of Squares): Variation explained by the regression. \(SSE = \sum (\hat{y}_i - \bar{y})^2\)
SSR (Sum of Squared Residuals): Unexplained variation. \(SSR = \sum \hat{u}_i^2\)
If the regression equation includes a constant term, it is a mathematical property that SST = SSE + SSR.
Definition: \(R^2\)
The R-squared measures the proportion of the total sample variation in \(y\) that is “explained” by the regression model.
\[ R^2 = \frac{SSE}{SST} = 1 - \frac{SSR}{SST} \]
Definition: Unbiasedness
An OLS estimator is considered unbiased if its expected value, across many hypothetical samples, is equal to the true population parameter it is intended to estimate. Mathematically, for a regression coefficient \(\beta\), its OLS estimator \(\hat{\beta}\) is unbiased if:
\[ E[\hat{\beta}] = \beta \]
SLR Assumptions
Theorem: Unbiasedness of OLS
Under assumptions SLR.1 through SLR.4, the OLS estimators are unbiased.
\[ E(\hat{\beta}_0) = \beta_0 \quad \text{and} \quad E(\hat{\beta}_1) = \beta_1 \]
Assumption 4, \(E(u|x) = 0\), is the most important assumption for establishing causality.
When is unbiasedness violated?
An important reason arises when a relevant explanatory variable is excluded from the model, referred to as Omitted Variable Bias.
For OVB to exist, two conditions must be met:
When these conditions hold, the OLS estimator for the included variable’s coefficient mistakenly incorporates the effect of the omitted variable, leading to a biased result.
Omitted Variable Bias Definition
Consider the true regression model: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u\), where \(x_2\) is the omitted variable.
Instead, we estimate the simpler model \(y = \alpha_0 + \alpha_1 x_1 + v\). The bias in the estimate of \(\alpha_1\) can be expressed as:
\[ \text{Bias} = E[\hat{\alpha_1}] - \beta_1 = \beta_2 \cdot \delta_1 \]
Where:
Examples: OVB
Suppose we estimate \(\text{Wage} = \beta_0 + \beta_1 \text{Education} + u\), but there is an omitted variable \(X_2\), Innate Ability. The resulting coefficient will be an oversestimate of the true effect:
(+) * (+) = +.Examples: OVB
Suppose we estimate the model \(\text{Educational_Outcome} = \beta_0 + \beta_1 \text{Class Size} + u\), with the omitted variable \(X_2\) being a student’s socioeconomic background or level of need. The resulting bias is positive (leading to an underestimation of a negative effect):
(-) * (-) = +.Examples: OVB
Suppose we estimate \(\text{Profits} = \beta_0 + \beta_1\text{Firm Size} + u\), with the omitted variable \(X_2\) being Quality of Management. The resulting bias is positive, resulting in overestimation of the effect.
(+) * (+) = +.Theorem: Variance of the OLS Estimator
Under assumptions SLR.1 through SLR.5 (all four SLR assumptions plus homoskedasticity), the estimated variance of the OLS slope estimator is:
\[ \widehat{Var}(\hat{\beta}_1) = \frac{\hat{\sigma}^2}{\sum_{i=1}^n (x_i - \bar{x})^2} = \frac{\hat{\sigma}^2}{SST_x} \]
Definition: Standard Error of a Regression
\[ \hat{\sigma} = SER = \sqrt{\frac{SSR}{n-2}} = \sqrt{\frac{\sum e_i^2}{n-2}} \]
Definition: Standard Error of a Coefficient
\[ SE(\hat{\beta}) = \hat{\sigma} / \sqrt{\sum_{i=1}^n (X_i - \bar{X})^2} \]
To test a hypothesis about a single coefficient (e.g., \(H_0: \beta = 0\)), we want to see how many standard deviations our estimate \(\hat{\beta}\) is from the hypothesized value. \[ \text{Test Stat} = (\text{Our Estimate of } \hat{\beta} - \text{Hypothesized Value}) / \text{Standard Error}(\hat{\beta}) \]
If we knew the true population standard error \(\sigma\), this statistic would follow a perfect Normal distribution.
We don’t know the population standard error, \(\sigma\).
Solution: We replace \(\sigma\) with its sample estimate, \(\hat{\sigma} = \sqrt{\frac{SSR}{n-k-1}}\).
Definition: Distribution of t-value under \(H_0\)
\[ t = \frac{\hat{\beta} - \beta}{se(\hat{\beta}_j)} \sim t_{(n-2)} \]
Example: \(t\)-distribution vs. Normal Distribution

Example: \(t\)-test in Linear Regression
Consider the following regression output:
summary(slr_model)
##
## Call:
## lm(formula = wage ~ educ, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.7201 -2.3878 -0.3926 1.9554 11.6092
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7863 2.5164 0.710 0.479
## educ 1.1498 0.1887 6.092 2.19e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.4 on 98 degrees of freedom
## Multiple R-squared: 0.2747, Adjusted R-squared: 0.2673
## F-statistic: 37.11 on 1 and 98 DF, p-value: 2.192e-08From this, we can see that the regression standard error (SER), \(\hat{\sigma} = 3.4\). We can also see that the SE on the educ coefficient is 0.19. We can relate the two by dividing SER by \(\sqrt{\sum (X_i - \bar{X})^2}\):
We can therefore manually calculate the \(t\)-statistic testing \(H_0: \beta=0\) as:
Finally we can even calculate the two-sided \(p\)-value of observing a test statistic this extreme under the null hypothesis:
Empirical Economics: Lecture 1 - The Linear Model I