Tutorial 3: Time Series
Time series data consists of observations of a variable or several variables over a specific chronological order. Key Characteristics:
A time series is covariance stationary if its statistical properties are constant over time. This is a crucial assumption for many models.
Non-stationary data can lead to unreliable and misleading results.
Univariate models describe the behavior of a single time series using its own past.
Autoregressive (AR) Model: The current value is a function of past values. An AR(1) model is: \[ Y_t = \alpha + \rho Y_{t-1} + u_t \] Here, \(\rho\) measures the persistence of shocks.
Moving Average (MA) Model: The current value is a function of past random shocks (\(u_t\)). An MA(1) model is: \[ Y_t = \mu + u_t + \theta u_{t-1} \] This model has a finite memory of past shocks.
If \(|\rho| \ge 1\) in an AR(1) model, the series is non-stationary.
A special case where \(\rho=1\) is called a random walk or a unit root process. In this case, shocks have permanent effects.
Testing for Unit Roots: The Dickey-Fuller (DF) Test
This test is used to determine if a series is stationary.
Model: \(\Delta y_t = \gamma y_{t-1} + \epsilon_t\), where \(\gamma = \rho - 1\).
Hypotheses:
The Danger of Spurious Regression:
Regressing two unrelated non-stationary time series on each other can produce statistically significant results (high \(R^2\), significant t-statistics) purely because both series share a common trend.
This creates a misleading and nonsensical relationship.
Solution: Ensure variables are stationary (often by taking first differences) or use models designed for non-stationary data.
Autoregressive Distributed Lag (ARDL) Models
ARDL models are flexible tools that capture complex dynamics by including lags of both the dependent variable (AR part) and independent variable(s) (DL part). An ARDL(p,q) model has the form:\(Y_t = \alpha + \sum_{i=1}^{p} \rho_i Y_{t-i} + \sum_{j=0}^{q} \beta_j X_{t-j} + u_t\)
Interpreting ARDL Coefficients
Short-Run (Impact) Multiplier: The immediate effect of a one-unit change in \(X_t\) on \(Y_t\) is given by \(\beta_0\).
Long-Run Multiplier (LRM): The total, cumulative effect on \(Y\) after a permanent change in \(X\) has fully worked through the system. It is calculated as: \[ \theta_{LRM} = \frac{\sum_{j=0}^{q} \beta_j}{1 - \sum_{i=1}^{p} \rho_i} \]
How to Choose the Right Model?
Forecasting:
Wooclap Link Wooclap Code: OFZFSD (Tutorial 3)
This question focuses on univariate modeling and prediction. You will model the fertility rate using the FERTIL3.DTA dataset available here. Import it into R/Python/Stata and
Fit an Autoregressive, AR(p), model. Use an information criterion (like the AIC or BIC/SIC) to select the optimal number of lags, p.
Using the autoregressive specification of 2 lags, compute (by hand) a forcast of two periods into the future.
Generate a forecast from your fitted model for 10 periods after the end of the dataset.
What happens to the forecast after a certain number of periods?
This question tests your ability to identify non-stationary data and avoid spurious regression. You will investigate the relationship between housing inventory (the dependent variable) and population size (the independent variable) from the HSEINV.DTA dataset available here.
Run a simple OLS regression with the Housing Inventory (inv) as the dependent variable and Population (pop) as the independent variable.
Report the R-squared value and the t-statistic (or p-value) for the inv coefficient. What would a naive interpretation of these results suggest about the relationship between these two variables?
Run the Breusch-Godfrey test with order=1 and report the \(p\)-value. What does this suggest?
Repeat step 1 and 3, but include the lagged housing inventory, the lagged population, and both two-period lags of housing and population as independent variables. What do you conclude?
The lecture notes mention that for a stationary AR(1) process, \(Y_t = \alpha + \rho Y_{t-1} + u_t\) (with \(|\rho|<1\)), the unconditional mean is \(E(Y_t) = \frac{\alpha}{1-\rho}\).
Prove this result.
Hint: Start by taking the expected value of both sides of the AR(1) equation. Then, use the stationarity assumption, which implies that \(E(Y_t) = E(Y_{t-1}) = \mu\), and solve for \(\mu\).
Consider the Moving Average model of order 1, or MA(1): \(Y_t = \mu + u_t + \theta u_{t-1}\) where \(u_t\) is a white noise process with mean 0 and variance \(\sigma^2_u\).
Derive the following properties for the MA(1) process:
A random walk, \(Y_t = Y_{t-1} + u_t\), is the classic example of a non-stationary process. The lecture suggests that differencing the data can induce stationarity.
Show that the first difference of a random walk, defined as \(\Delta Y_t = Y_t - Y_{t-1}\), is a stationary process.
Hint: To prove stationarity, you must show that \(\Delta Y_t\) satisfies the three conditions: constant mean, constant variance, and constant autocovariance that depends only on the lag.
The lecture provides the general formula for the Long-Run Multiplier (LRM). Now, apply that logic to a specific case. Consider the following ARDL(2,1) model:
\[Y_t = 10 + 0.5 Y_{t-1} + 0.2 Y_{t-2} + 2.0 X_t - 0.8 X_{t-1} + u_t\]
Task: Calculate the Long-Run Multiplier for this model.
The lecture defines a random walk as an AR(1) model with \(\rho=1\). If the daily price of a stock is believed to follow a random walk, what is the single best piece of information you could use to forecast its price for tomorrow?
Explain why a shock (e.g., unexpected good news) on a given day has a “permanent” effect on the future price path of the stock.
You estimate a linear model \(Y_t = \alpha + \beta X_t + \epsilon_t\) and find a high R-squared and a statistically significant \(\beta\) coefficient. However, you suspect the relationship might be spurious because plots of both \(Y_t\) and \(X_t\) show strong trends. To investigate this, you use the Dickey-Fuller (or ADF) test.
In an ARDL(1,1) model, \(Y_t = \alpha + \rho_1 Y_{t-1} + \beta_0 X_t + \beta_1 X_{t-1} + u_t\), the long-run multiplier is given by the formula \(\theta = (\beta_0 + \beta_1) / (1 - \rho_1)\).
Provide an intuitive explanation for why we divide the sum of the X-coefficients by \((1 - \rho_1)\). What role does the persistence factor, \(\rho_1\), play in determining the total long-run effect?
Empirical Economics: Tutorial - Time Series