Solutions Tutorial 3

Forecasting Fertility

This question focuses on univariate modeling and prediction. You will model the fertility rate using the FERTIL3.DTA dataset. Import it into R/Python/Stata and

Fit an Autoregressive, AR(p), model. Use the Akaike information criterion to select the optimal number of lags, p.

The output from R is a bit different from the Python/Stata output due to implementational differences.

library(haven)
url <- 'https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/FERTIL3.DTA'
fertil <- read_dta(url)
# Estimate a number of models (up to 8)
# Using AIC with ar() function
ar_model <- ar(fertil$gfr, aic=TRUE, order.max=10)
ar_model$order  # Optimal order based on AIC

[1] 2

The output from Python is a bit different from the R output due to implementational differences.

import pandas as pd
from statsmodels.tsa.ar_model import ar_select_order

# URL for the Stata data file
url = 'https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/FERTIL3.DTA'

# Read the data using pandas (equivalent to haven::read_dta)
fertil = pd.read_stata(url)

# Select the optimal AR model order using AIC
# This is equivalent to R's ar.ols(fertil$gfr, aic=TRUE, order.max=10)
selection_result = ar_select_order(fertil['gfr'], maxlag=10, ic='aic')

# Print the optimal order (equivalent to ar_model$order)
print(selection_result.model._k_ar)

The output from Stata is a bit different from the R/Python output due to implementational differences.

* 1. Clear any existing data and load the .dta file from the URL
use "https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/FERTIL3.DTA", clear

* 2. Declare the data as a time series, using the 'year' variable as the time index
tsset year

* 3. Run the lag-order selection command for the 'gfr' variable
* This command calculates various criteria (including AIC) for models with 1 up to 10 lags.
* The asterisk (*) in the output table marks the optimal lag for each criterion.
varsoc gfr, maxlag(10)

Using an autoregressive specification with 2 lags, compute (by hand) a forecast of two periods into the future.

Since the optimal order is 2, we need 2 observations to make our prediction for the next period:

# Get the necessary ingredients for the forecasts
p <- ar_model$order
coeffs <- ar_model$ar
intercept <- ar_model$x.mean * (1 - sum(coeffs))
last_values <- tail(fertil$gfr, p) # Get the last p observations needed for the forecast

# Forecast for period T+1
# Formula: forecast = intercept + (coeff_1 * Y_T) + (coeff_2 * Y_{T-1}) + ...
forecast_1 <- intercept + sum(coeffs * rev(last_values))
forecast_1

[1] 67.30149

# Forecast for period T+2
# We need to update the "last_values" vector by prepending our first forecast
# and dropping the oldest observation.
new_values <- c(forecast_1, last_values[-p])
forecast_2 <- intercept + sum(coeffs * new_values)
forecast_2

[1] 69.42403

from statsmodels.tsa.ar_model import AutoReg
import pandas as pd

# Step 1: Estimate a model with lag 2
model = AutoReg(fertil['gfr'], lags = 2).fit()

# Step 2: Get the last two observed values required for the forecast ---
y_T = fertil.iloc[-1]['gfr']      # The last value in the series, Y(t)
y_T_minus_1 = fertil.iloc[-2]['gfr'] # The second to last value, Y(t-1)

# Step 3: Extract the estimated parameters from the fitted model ---
params = model.params
intercept = params['const']
phi_1 = params['gfr.L1']
phi_2 = params['gfr.L2']

# Step 4: Prediction
# Forecast 1: Prediction for period T+1
forecast_1 = intercept + (phi_1 * y_T) + (phi_2 * y_T_minus_1)
# Forecast 2: Prediction for period T+2
# For this step, the predicted value 'forecast_1' becomes the new Y(t),
# and the previous Y(t) becomes the new Y(t-1).
forecast_2 = intercept + (phi_1 * forecast_1) + (phi_2 * y_T)

print("Manual 'By Hand' Forecasts:")
## Manual 'By Hand' Forecasts:
print(f"  Forecast for T+1: {forecast_1:.4f}")
##   Forecast for T+1: 65.6892
print(f"  Forecast for T+2: {forecast_2:.4f}")
##   Forecast for T+2: 66.1786
print("-" * 40)
## ----------------------------------------

* --- Setup: Load data and set as time series ---
clear
use "https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/FERTIL3.DTA", clear
tsset year

* --- Step 1: Estimate the AR(2) model ---
* We regress gfr on its first and second lags (L.gfr and L2.gfr)
regress gfr L.gfr L2.gfr

* --- Step 2: Extract the estimated parameters into local macros ---
* Stata stores the coefficients in _b[varname] after a regression
local intercept = _b[_cons]
local phi1 = _b[L.gfr]
local phi2 = _b[L2.gfr]

* Display the captured parameters to verify
display ""
display as text "Fitted AR(2) Model Parameters:"
display as result "  Intercept (c):    " `intercept'
display as result "  Lag 1 Coeff (phi1): " `phi1'
display as result "  Lag 2 Coeff (phi2): " `phi2'
display "----------------------------------------"

* --- Step 3: Get the last two observed values required for the forecast ---
* _N refers to the last observation in the dataset
local y_T = gfr[_N]
local y_T_minus_1 = gfr[_N-1]

display as text "Last two observed values used for prediction:"
display as result "  Y(t)   = " `y_T'
display as result "  Y(t-1) = " `y_T_minus_1'
display "----------------------------------------"

* --- Step 4: "By Hand" prediction of 2 periods in the future ---
* Use the 'display' command as a calculator with our stored macros

* Forecast 1: Prediction for period T+1
local forecast_1 = `intercept' + (`phi1' * `y_T') + (`phi2' * `y_T_minus_1')

* Forecast 2: Prediction for period T+2
* Note: We use the calculated `forecast_1` as the new Y(t)
local forecast_2 = `intercept' + (`phi1' * `forecast_1') + (`phi2' * `y_T')

display as text "Manual 'By Hand' Forecasts:"
display as result "  Forecast for T+1: " `forecast_1'
display as result "  Forecast for T+2: " `forecast_2'

Generate a forecast from your fitted model for 10 periods after the end of the dataset.

This can also be done using the predict() function in R:

predict(ar_model, n.ahead = 10)$pred

Time Series:
Start = 73 
End = 82 
Frequency = 1 
 [1] 67.30149 69.49665 71.60180 73.55314 75.34913 76.99964 78.51597 79.90892
 [9] 81.18851 82.36396

predictions = model.forecast(steps=10)
print(predictions)

72    65.689199
73    66.178640
74    66.712472
75    67.241675
76    67.751467
77    68.238000
78    68.700895
79    69.140837
80    69.558819
81    69.955889
dtype: float64

* Step 1: Create a forecast model and add the AR(2) estimation ---
* Initialize a new forecast model named 'mymodel'
forecast create mymodel, replace

*Step 2: regress the model
regress gfr L.gfr L2.gfr

*Step 3: create the estimates of model
estimates store myar2

*Step 4: use the estimates to forecast
forecast estimates myar2

*Step 5: extend the dataset with the number of T you would like to forecast
tsappend, add(10)

* Step 6: Solve the forecast --- you need to specify the starting year from where you would like to use the model to forecast as well as the number of periods 
forecast solve, begin(1985) periods(10)

*Step 7: View the results
* The results are stored in new variables, typically with an _f suffix.
* Let's list them to see the predictions.
 list year f_gfr if t > 72

What happens to the forecast after a certain number of periods?

The forecast starts to convert towards the mean of the AR process.

Investigating Autocorrelation

Run a simple OLS regression with the Housing Inventory (inv) as the dependent variable and Population (pop) as the independent variable.

library(haven)
url <- 'https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/HSEINV.DTA'
housing <- haven::read_dta(url)
model <- lm(inv ~ pop, data = housing)

import pandas as pd
import statsmodels.formula.api as smf
url = 'https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/HSEINV.DTA'
housing = pd.read_stata(url)
model = smf.ols("inv ~ pop", data = housing)

* 1. Load the Stata dataset directly from the URL
* The 'clear' option removes any data currently in memory before loading the new file.
use "https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/HSEINV.DTA", clear

* 2. Run the OLS regression
regress inv pop

Report the R-squared value and the t-statistic (or p-value) for the inv coefficient. What would a naive interpretation of these results suggest about the relationship between these two variables?

summary(model)


Call:
lm(formula = inv ~ pop, data = housing)

Residuals:
   Min     1Q Median     3Q    Max 
-50491  -7771   -688   9951  32312 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -6.768e+04  1.703e+04  -3.973 0.000288 ***
pop          8.724e-01  8.531e-02  10.226 1.01e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16670 on 40 degrees of freedom
Multiple R-squared:  0.7233,    Adjusted R-squared:  0.7164 
F-statistic: 104.6 on 1 and 40 DF,  p-value: 1.011e-12

model.fit().summary()

OLS Regression Results
Dep. Variable:	inv	R-squared:	0.723
Model:	OLS	Adj. R-squared:	0.716
Method:	Least Squares	F-statistic:	104.6
Date:	Wed, 29 Oct 2025	Prob (F-statistic):	1.01e-12
Time:	12:55:39	Log-Likelihood:	-466.87
No. Observations:	42	AIC:	937.7
Df Residuals:	40	BIC:	941.2
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	-6.768e+04	1.7e+04	-3.973	0.000	-1.02e+05	-3.33e+04
pop	0.8724	0.085	10.226	0.000	0.700	1.045

Omnibus:	3.352	Durbin-Watson:	0.833
Prob(Omnibus):	0.187	Jarque-Bera (JB):	2.238
Skew:	-0.368	Prob(JB):	0.327
Kurtosis:	3.859	Cond. No.	1.32e+06

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.32e+06. This might indicate that there are
strong multicollinearity or other numerical problems.

The former command already shows the result.

Run the Breusch-Godfrey test with order=1 and report the \(p\)-value. What does this suggest?

library(lmtest, quietly=T)
bgtest(inv ~ pop, order=1, data=housing)


    Breusch-Godfrey test for serial correlation of order up to 1

data:  inv ~ pop
LM test = 14.023, df = 1, p-value = 0.0001806

from statsmodels.stats.diagnostic import acorr_breusch_godfrey
bg_test = acorr_breusch_godfrey(model.fit(), nlags=1)
f_statistic = bg_test[2]
f_p_value = bg_test[3]

print(f"F Stat.: {f_statistic}")

F Stat.: 19.54838173116814

print(f"p-val.: {f_p_value}")

p-val.: 7.619583942835907e-05

estat bgodfrey, lags(1)

This suggest that there is serial autocorrelation. The model is likely misspecified.

Repeat step 1 and 3, but include the lagged housing inventory, the lagged population, and both two-period lags of housing and population as independent variables. What do you conclude?

library(dplyr)
model2 <- lm(inv ~ pop + lag(pop) + lag(inv) + lag(pop, 2) + lag(inv, 2), data=housing)
bgtest(model2, order=1, data=housing)


    Breusch-Godfrey test for serial correlation of order up to 1

data:  model2
LM test = 1.9688, df = 1, p-value = 0.1606

housing['lag1pop'] = housing['pop'].shift(1)
housing['lag2pop'] = housing['pop'].shift(2)
housing['lag1inv'] = housing['inv'].shift(1)
housing['lag2inv'] = housing['inv'].shift(2)

model_ext = smf.ols("inv ~ pop + lag1pop + lag2pop + lag1inv + lag2inv", data = housing).fit()
bg_test = acorr_breusch_godfrey(model_ext, nlags=1)
print("F-statistic:", bg_test[2])

F-statistic: 1.7083239777452806

print("p-value:", bg_test[3])

p-value: 0.20023775113163977

* Load the data
use "https://github.com/basm92/ee_website/raw/refs/heads/master/tutorials/datafiles/HSEINV.DTA", clear

* Run the OLS regression
regress inv pop L.pop L.inv L2.pop L2.inv
* Perform the Breusch-Godfrey test for autocorrelation up to 1 lag
estat bgodfrey, lags(1)

After controlling for the lagged housing inventory and lagged population, there seems to be no autocorrelation in the errors. The inv series thus seems to follow an ARDL(2,2) model.

Derivation of the Unconditional Mean of an AR(1) Process

The task is to prove that for a stationary AR(1) process, \(Y_t = \alpha + \rho Y_{t-1} + u_t\) (with \(|\rho|<1\)), the unconditional mean is \(E(Y_t) = \frac{\alpha}{1-\rho}\).

Proof:

Start with the AR(1) equation: \[ Y_t = \alpha + \rho Y_{t-1} + u_t \]
Take the expected value of both sides: \[ E(Y_t) = E(\alpha + \rho Y_{t-1} + u_t) \]
Use the linearity of the expectation operator: \[ E(Y_t) = E(\alpha) + E(\rho Y_{t-1}) + E(u_t) \]
Apply the properties of the model’s components:
- The expected value of a constant is the constant itself: \(E(\alpha) = \alpha\).
- The expected value of the white noise error term is zero: \(E(u_t) = 0\).
- We can pull the constant \(\rho\) out of the expectation: \(E(\rho Y_{t-1}) = \rho E(Y_{t-1})\).
Substituting these in gives: \[ E(Y_t) = \alpha + \rho E(Y_{t-1}) \]
Invoke the stationarity assumption: For a stationary process, the mean is constant over time. This implies that the expected value of the series at time \(t\) is the same as at time \(t-1\). Let’s call this constant mean \(\mu\). \[ E(Y_t) = E(Y_{t-1}) = \mu \]
Substitute \(\mu\) into the equation and solve: \[ \mu = \alpha + \rho \mu \] \[ \mu - \rho \mu = \alpha \] \[ \mu(1 - \rho) = \alpha \] \[ \mu = \frac{\alpha}{1 - \rho} \] This completes the proof.

Statistical Properties of an MA(1) Process

Consider the MA(1) process: \(Y_t = \mu + u_t + \theta u_{t-1}\), where \(u_t\) is white noise with \(E(u_t)=0\) and \(Var(u_t)=\sigma^2_u\).

1. The Mean: \(E(Y_t)\) \[ \begin{align} E(Y_t) &= E(\mu + u_t + \theta u_{t-1}) \\ &= E(\mu) + E(u_t) + E(\theta u_{t-1}) \\ &= \mu + 0 + \theta E(u_{t-1}) \\ &= \mu + 0 + 0 = \mu \end{align} \] The mean of an MA(1) process is simply \(\mu\).

2. The Variance: \(Var(Y_t) = \gamma_0\) Since variance is unaffected by a constant mean, we look at \(Var(u_t + \theta u_{t-1})\). Because \(u_t\) and \(u_{t-1}\) are from a white noise process, they are uncorrelated. Therefore, the variance of their sum is the sum of their variances. \[ \begin{align} Var(Y_t) &= Var(u_t + \theta u_{t-1}) \\ &= Var(u_t) + Var(\theta u_{t-1}) \\ &= Var(u_t) + \theta^2 Var(u_{t-1}) \\ &= \sigma^2_u + \theta^2 \sigma^2_u \\ &= (1 + \theta^2)\sigma^2_u \end{align} \] The variance of an MA(1) process is \(\gamma_0 = (1 + \theta^2)\sigma^2_u\).

3. The First-Order Autocovariance: \(Cov(Y_t, Y_{t-1}) = \gamma_1\) \[ \begin{align} \gamma_1 &= Cov(Y_t, Y_{t-1}) = E[(Y_t - \mu)(Y_{t-1} - \mu)] \\ &= E[(u_t + \theta u_{t-1})(u_{t-1} + \theta u_{t-2})] \\ &= E[u_t u_{t-1} + \theta u_t u_{t-2} + \theta u_{t-1}^2 + \theta^2 u_{t-1}u_{t-2}] \\ &= E[u_t u_{t-1}] + \theta E[u_t u_{t-2}] + \theta E[u_{t-1}^2] + \theta^2 E[u_{t-1}u_{t-2}] \end{align} \] Using the white noise properties (\(E[u_t u_s] = 0\) for \(t \neq s\) and \(E[u_t^2] = \sigma^2_u\)): \[ \gamma_1 = 0 + \theta(0) + \theta(\sigma^2_u) + \theta^2(0) = \theta \sigma^2_u \]

4. The Second-Order Autocovariance: \(Cov(Y_t, Y_{t-2}) = \gamma_2\) \[ \begin{align} \gamma_2 &= Cov(Y_t, Y_{t-2}) = E[(Y_t - \mu)(Y_{t-2} - \mu)] \\ &= E[(u_t + \theta u_{t-1})(u_{t-2} + \theta u_{t-3})] \\ &= E[u_t u_{t-2} + \theta u_t u_{t-3} + \theta u_{t-1} u_{t-2} + \theta^2 u_{t-1}u_{t-3}] \\ &= E[u_t u_{t-2}] + \theta E[u_t u_{t-3}] + \theta E[u_{t-1} u_{t-2}] + \theta^2 E[u_{t-1}u_{t-3}] \end{align} \] Since all time indices in the cross-products are different, every expected value is zero. \[ \gamma_2 = 0 + 0 + 0 + 0 = 0 \]

5. Conclusion about the “Memory” of an MA(1) Process The result that the autocovariance is zero for all lags of 2 or more (\(\gamma_k = 0\) for \(k \geq 2\)) shows that an MA(1) process has a finite memory of only one period. A random shock at time \(t-2\) (or earlier) has no correlation with the value of the series at time \(t\). The effect of a shock completely vanishes from the process after two periods.

Proving the Stationarity of a Differenced Random Walk

A random walk is given by \(Y_t = Y_{t-1} + u_t\). The first difference is \(\Delta Y_t = Y_t - Y_{t-1}\).

Express the differenced series in its simplest form: Substitute the definition of the random walk into the differencing equation: \[ \Delta Y_t = (Y_{t-1} + u_t) - Y_{t-1} = u_t \] This shows that the first difference of a random walk is simply a white noise process, \(u_t\).
Check the three conditions for stationarity for \(\Delta Y_t\):

Condition 1: Constant Mean The mean of the differenced series is: \[ E(\Delta Y_t) = E(u_t) = 0 \]

The mean is 0, which is constant for all \(t\).
Condition 2: Constant Variance The variance of the differenced series is: \[ Var(\Delta Y_t) = Var(u_t) = \sigma^2_u \]

The variance is \(\sigma^2_u\), which is constant for all \(t\).
Condition 3: Constant Autocovariance The autocovariance of the differenced series at lag \(k > 0\) is: \[ Cov(\Delta Y_t, \Delta Y_{t-k}) = Cov(u_t, u_{t-k}) \]

By the definition of a white noise process, the covariance between error terms at different points in time is zero.

\[ Cov(u_t, u_{t-k}) = 0 \quad \text{for all } k > 0 \]

The autocovariance is always 0 for any positive lag, so it depends only on the lag \(k\) and not on the time \(t\).

Since the differenced series \(\Delta Y_t\) satisfies all three conditions of covariance stationarity, we have shown that differencing a random walk induces stationarity.

Calculating the Long-Run Multiplier for a Specific ARDL Model

The ARDL(2,1) model is: \[ Y_t = 10 + 0.5 Y_{t-1} + 0.2 Y_{t-2} + 2.0 X_t - 0.8 X_{t-1} + u_t \]

1. Algebraic Derivation of the LRM

In the long-run equilibrium, we assume variables are constant: \(Y_t = Y_{t-1} = Y_{t-2} = Y_{eq}\) and \(X_t = X_{t-1} = X_{eq}\). We also assume \(E(u_t)=0\).
Substitute these into the model: \[ Y_{eq} = 10 + 0.5 Y_{eq} + 0.2 Y_{eq} + 2.0 X_{eq} - 0.8 X_{eq} \]
Group the \(Y_{eq}\) and \(X_{eq}\) terms: \[ Y_{eq} - 0.5 Y_{eq} - 0.2 Y_{eq} = 10 + (2.0 - 0.8) X_{eq} \]
Factor out \(Y_{eq}\): \[ Y_{eq}(1 - 0.5 - 0.2) = 10 + (2.0 - 0.8) X_{eq} \]
The general form is \(Y_{eq} = \frac{c}{1-\sum\rho_i} + \frac{\sum\beta_j}{1-\sum\rho_i} X_{eq}\). The LRM is the coefficient on \(X_{eq}\): \[ \text{LRM} = \theta = \frac{\sum\beta_j}{1-\sum\rho_i} \]

2. Numerical Calculation

Sum of the coefficients on the explanatory variable (the \(\beta\)’s): \[ \sum\beta_j = 2.0 + (-0.8) = 1.2 \]
Sum of the coefficients on the lagged dependent variable (the \(\rho\)’s): \[ \sum\rho_i = 0.5 + 0.2 = 0.7 \]
Substitute these values into the LRM formula: \[ \text{LRM} = \frac{1.2}{1 - 0.7} = \frac{1.2}{0.3} = 4.0 \]

3. Interpretation

A permanent one-unit increase in X is associated with a total long-run change of 4.0 units in Y, after all dynamic adjustments are complete.

The Random Walk Hypothesis

1. Best Forecast for Tomorrow’s Price: If a stock’s price follows a random walk (\(Y_t = Y_{t-1} + u_t\)), the single best piece of information to forecast its price for tomorrow is today’s price. The mathematical forecast for tomorrow’s price (\(Y_{t+1}\)), given all information up to today, is the conditional expectation \(E(Y_{t+1}|Y_t, Y_{t-1}, ...)\). This simplifies to: \[ E(Y_{t+1}|Y_t) = E(Y_t + u_{t+1}|Y_t) = Y_t + E(u_{t+1}) = Y_t + 0 = Y_t \] Therefore, the best forecast for tomorrow’s value is simply today’s value.

2. “Permanent” Effect of a Shock: A shock \(u_t\) has a permanent effect because it is fully incorporated into the level of the series from that point forward. Consider the price at time \(t\): \(Y_t = Y_{t-1} + u_t\). The price at time \(t+1\) is \(Y_{t+1} = Y_t + u_{t+1}\). The price at time \(t+k\) can be written as:

\[ Y_{t+k} = Y_t + u_{t+1} + u_{t+2} + \dots + u_{t+k} \]

Since the initial shock \(u_t\) is part of the term \(Y_t\), its value is carried forward indefinitely in all future values of the series. Unlike a stationary process where shocks eventually dissipate, in a random walk, a shock permanently shifts the entire future path of the series up or down.

The Role of the Dickey-Fuller Test

Null Hypothesis: The null hypothesis (\(H_0\)) of the Dickey-Fuller test is that the time series has a unit root. In simpler terms, this means the series is non-stationary. The alternative hypothesis (\(H_A\)) is that the series is stationary.
Interpretation of Tests on Y and X: Failing to reject the null hypothesis for both \(Y_t\) and \(X_t\) indicates that both series are likely non-stationary (specifically, integrated of order 1, or I(1)). This finding confirms the initial suspicion based on the plots and is the classic prerequisite for a spurious regression. When two non-stationary series are regressed on each other, their common underlying trends can create the statistical illusion of a meaningful relationship, even if they are completely unrelated.
Interpretation of the Test on Residuals: This final step is the most critical diagnostic for spurious regression.
- Why it’s crucial: If \(Y_t\) and \(X_t\) shared a true, long-run economic relationship, we would expect the residuals (\(\hat{\epsilon}_t\)) of the regression to be stationary. This special case is called cointegration, where the deviations from the long-run equilibrium are temporary and mean-reverting.
- What the result means: By finding that the residuals are also non-stationary (failing to reject the null of a unit root), you have strong evidence that the regression is spurious. This result implies that the error term does not revert to a mean of zero; the deviations from the estimated regression line are themselves persistent and follow a random walk. This confirms that the model has not captured a stable economic equilibrium but has simply fitted one non-stationary trend onto another.

ARDL Model Interpretation

Intuitive Explanation of the LRM Formula: \(\theta = (\beta_0 + \beta_1) / (1 - \rho_1)\)

The numerator, \((\beta_0 + \beta_1)\), represents the total immediate and one-period-delayed impact of a one-unit change in X on Y. However, this is not the end of the story. The change in Y itself triggers a feedback loop because of the autoregressive term, \(\rho_1 Y_{t-1}\).
The denominator, \((1 - \rho_1)\), can be interpreted as the proportion of Y that is “new” in each period, i.e., the part that is not simply a carry-over from the previous period. Dividing the total initial impact of X by this “new” proportion effectively scales up the initial effect to account for the full, cumulative impact after the feedback mechanism has run its course over all future periods.

Role of the Persistence Factor, \(\rho_1\)

The persistence factor \(\rho_1\) is crucial because it dictates the strength and duration of the feedback effect:

If \(\rho_1\) is close to 1 (high persistence): The denominator \((1-\rho_1)\) becomes very small. Dividing by a small number makes the long-run multiplier much larger than the initial impact. This is because a shock to Y dies out very slowly, allowing the feedback effects to accumulate over many periods, leading to a large total change.
If \(\rho_1\) is close to 0 (low persistence): The denominator \((1-\rho_1)\) is close to 1. The long-run multiplier will be very close to the initial impact \((\beta_0 + \beta_1)\). This happens because any shock to Y dissipates quickly, so the feedback effects are weak and do not add much to the total effect.