Empirical Economics

Tutorial 3: Time Series

Tutorial 3

Recapitulation of the Lecture

Fundamentals of Time Series

Time series data consists of observations of a variable or several variables over a specific chronological order. Key Characteristics:

  • Temporal Ordering: The order of observations matters; the past can influence the future.
  • Serial Correlation (Autocorrelation): Observations are often correlated with their past values. The Autocorrelation Function (ACF) measures this “memory” or persistence.
  • Trends & Seasonality: Data may exhibit long-term movements (trends) or regular periodic patterns (seasonality).

A time series is covariance stationary if its statistical properties are constant over time. This is a crucial assumption for many models.

  1. Constant Mean: \(E(Y_t) = \mu\)
  2. Constant Variance: \(Var(Y_t) = \sigma^2\)
  3. Constant Autocovariance: \(Cov(Y_t, Y_{t-k}) = \gamma_k\) (depends only on the lag \(k\), not on time \(t\)).

Non-stationary data can lead to unreliable and misleading results.

Modeling a Single Time Series & Testing for Stationarity

  • Univariate models describe the behavior of a single time series using its own past.

  • Autoregressive (AR) Model: The current value is a function of past values. An AR(1) model is: \[ Y_t = \alpha + \rho Y_{t-1} + u_t \] Here, \(\rho\) measures the persistence of shocks.

  • Moving Average (MA) Model: The current value is a function of past random shocks (\(u_t\)). An MA(1) model is: \[ Y_t = \mu + u_t + \theta u_{t-1} \] This model has a finite memory of past shocks.

  • If \(|\rho| \ge 1\) in an AR(1) model, the series is non-stationary.

  • A special case where \(\rho=1\) is called a random walk or a unit root process. In this case, shocks have permanent effects.

Testing for Unit Roots

Testing for Unit Roots: The Dickey-Fuller (DF) Test

This test is used to determine if a series is stationary.

Model: \(\Delta y_t = \gamma y_{t-1} + \epsilon_t\), where \(\gamma = \rho - 1\).

Hypotheses:

  • Null Hypothesis (\(H_0\)): \(\gamma = 0\) (The series has a unit root and is non-stationary).
  • Alternative Hypothesis (\(H_A\)): \(\gamma < 0\) (The series is stationary).
  • The Augmented Dickey-Fuller (ADF) Test is an extension that accounts for more complex dynamics.

Relationships Between Time Series & Dynamic Models

The Danger of Spurious Regression:

  • Regressing two unrelated non-stationary time series on each other can produce statistically significant results (high \(R^2\), significant t-statistics) purely because both series share a common trend.

  • This creates a misleading and nonsensical relationship.

  • Solution: Ensure variables are stationary (often by taking first differences) or use models designed for non-stationary data.

Autoregressive Distributed Lag (ARDL) Models

  • ARDL models are flexible tools that capture complex dynamics by including lags of both the dependent variable (AR part) and independent variable(s) (DL part). An ARDL(p,q) model has the form:\(Y_t = \alpha + \sum_{i=1}^{p} \rho_i Y_{t-i} + \sum_{j=0}^{q} \beta_j X_{t-j} + u_t\)

  • Interpreting ARDL Coefficients

  • Short-Run (Impact) Multiplier: The immediate effect of a one-unit change in \(X_t\) on \(Y_t\) is given by \(\beta_0\).

  • Long-Run Multiplier (LRM): The total, cumulative effect on \(Y\) after a permanent change in \(X\) has fully worked through the system. It is calculated as: \[ \theta_{LRM} = \frac{\sum_{j=0}^{q} \beta_j}{1 - \sum_{i=1}^{p} \rho_i} \]

Model Selection & Forecasting

How to Choose the Right Model?

  • Adding more variables or lags increases \(R^2\) but risks overfitting (modeling noise instead of the true relationship).
  • We need criteria that balance model fit with model simplicity (parsimony).
  • Information Criteria:
  • Akaike Information Criterion (AIC): \(AIC = 2k - 2\ln(L)\)
  • Bayesian Information Criterion (BIC): \(BIC = k \ln(n) - 2\ln(L)\)
  • For both, \(k\) is the number of parameters and \(L\) is the maximized likelihood. We choose the model with the lowest AIC or BIC value. BIC tends to select simpler models than AIC.

Forecasting:

  • To predict future values based on all available information up to the present, i.e., calculate the conditional expectation \(E[Y_{t+h} | I_t]\).
  • Procedure: Forecasting is an iterative process where unknown future values are replaced by their forecasts.
    • AR Models: Forecasts have a long memory and converge slowly to the series mean.
    • MA Models: Forecasts have a short memory, reverting to the mean after \(q\) periods.
  • Evaluation: Forecast accuracy is tested on out-of-sample data using metrics like Mean Squared Error (MSE) or Root Mean Squared Error (RMSE): \(RMSE = \sqrt{\frac{1}{n} \sum_{t=1}^{n} (Y_t - \hat{Y}_t)^2}\).

Wooclap

Wooclap

Wooclap Link Wooclap Code: OFZFSD (Tutorial 3)

Questions

Forecasting Fertility

This question focuses on univariate modeling and prediction. You will model the fertility rate using the FERTIL3.DTA dataset available here. Import it into R/Python/Stata and

  1. Fit an Autoregressive, AR(p), model. Use an information criterion (like the AIC or BIC/SIC) to select the optimal number of lags, p.

  2. Using the autoregressive specification of 2 lags, compute (by hand) a forcast of two periods into the future.

  3. Generate a forecast from your fitted model for 10 periods after the end of the dataset.

  4. What happens to the forecast after a certain number of periods?

Investigating Autocorrelation

This question tests your ability to identify non-stationary data and avoid spurious regression. You will investigate the relationship between housing inventory (the dependent variable) and population size (the independent variable) from the HSEINV.DTA dataset available here.

  1. Run a simple OLS regression with the Housing Inventory (inv) as the dependent variable and Population (pop) as the independent variable.

  2. Report the R-squared value and the t-statistic (or p-value) for the inv coefficient. What would a naive interpretation of these results suggest about the relationship between these two variables?

  3. Run the Breusch-Godfrey test with order=1 and report the \(p\)-value. What does this suggest?

  4. Repeat step 1 and 3, but include the lagged housing inventory, the lagged population, and both two-period lags of housing and population as independent variables. What do you conclude?

Derivation of the Unconditional Mean of an AR(1) Process

The lecture notes mention that for a stationary AR(1) process, \(Y_t = \alpha + \rho Y_{t-1} + u_t\) (with \(|\rho|<1\)), the unconditional mean is \(E(Y_t) = \frac{\alpha}{1-\rho}\).

Prove this result.

Hint: Start by taking the expected value of both sides of the AR(1) equation. Then, use the stationarity assumption, which implies that \(E(Y_t) = E(Y_{t-1}) = \mu\), and solve for \(\mu\).

Statistical Properties of an MA(1) Process

Consider the Moving Average model of order 1, or MA(1): \(Y_t = \mu + u_t + \theta u_{t-1}\) where \(u_t\) is a white noise process with mean 0 and variance \(\sigma^2_u\).

Derive the following properties for the MA(1) process:

  1. The mean: \(E(Y_t)\)
  2. The variance: \(Var(Y_t) = \gamma_0\)
  3. The first-order autocovariance: \(Cov(Y_t, Y_{t-1}) = \gamma_1\)
  4. The second-order autocovariance: \(Cov(Y_t, Y_{t-2}) = \gamma_2\)
  5. Based on your result for (d), what can you conclude about the “memory” of an MA(1) process?

Proving the Stationarity of a Differenced Random Walk

A random walk, \(Y_t = Y_{t-1} + u_t\), is the classic example of a non-stationary process. The lecture suggests that differencing the data can induce stationarity.

Show that the first difference of a random walk, defined as \(\Delta Y_t = Y_t - Y_{t-1}\), is a stationary process.

Hint: To prove stationarity, you must show that \(\Delta Y_t\) satisfies the three conditions: constant mean, constant variance, and constant autocovariance that depends only on the lag.

Calculating the Long-Run Multiplier for a Specific ARDL Model

The lecture provides the general formula for the Long-Run Multiplier (LRM). Now, apply that logic to a specific case. Consider the following ARDL(2,1) model:

\[Y_t = 10 + 0.5 Y_{t-1} + 0.2 Y_{t-2} + 2.0 X_t - 0.8 X_{t-1} + u_t\]

Task: Calculate the Long-Run Multiplier for this model.

  1. First, derive the algebraic expression for the LRM using the equilibrium condition where \(Y_t = Y_{t-1} = Y_{t-2} = Y_{eq}\) and \(X_t = X_{t-1} = X_{eq}\).
  2. Second, substitute the numerical coefficients from the model to find the value of the LRM.
  3. Provide a one-sentence interpretation of the numerical value you calculated.

The Random Walk Hypothesis

The lecture defines a random walk as an AR(1) model with \(\rho=1\). If the daily price of a stock is believed to follow a random walk, what is the single best piece of information you could use to forecast its price for tomorrow?

Explain why a shock (e.g., unexpected good news) on a given day has a “permanent” effect on the future price path of the stock.

The Role of the Dickey-Fuller Test

You estimate a linear model \(Y_t = \alpha + \beta X_t + \epsilon_t\) and find a high R-squared and a statistically significant \(\beta\) coefficient. However, you suspect the relationship might be spurious because plots of both \(Y_t\) and \(X_t\) show strong trends. To investigate this, you use the Dickey-Fuller (or ADF) test.

  1. What is the null hypothesis of this test?
  2. You run the ADF test on both \(Y_t\) and \(X_t\) individually and, in both cases, the p-value is large (e.g., > 0.10). What does this failure to reject the null hypothesis tell you about the nature of these two time series?
  3. Finally, you run the ADF test on the residuals, \(\hat{\epsilon}_t\), from your initial regression. The p-value for this test is also large. Why is this final step crucial, and what does this result strongly suggest about your initial regression model?

ARDL Model Interpretation

In an ARDL(1,1) model, \(Y_t = \alpha + \rho_1 Y_{t-1} + \beta_0 X_t + \beta_1 X_{t-1} + u_t\), the long-run multiplier is given by the formula \(\theta = (\beta_0 + \beta_1) / (1 - \rho_1)\).

Provide an intuitive explanation for why we divide the sum of the X-coefficients by \((1 - \rho_1)\). What role does the persistence factor, \(\rho_1\), play in determining the total long-run effect?