Thou Shalt Lag-Augment, Not “Stationarize” when using Local Projections

A recurring comment in referee reports is that local projections “require stationarity,” or that the researcher must “stationarize” (difference/detrend) the variables before computing impulse responses. In a time-series LP setting, that objection is typically misdirected. The relevant issue is not whether the series are stationary as a prerequisite for estimation, but whether the inference procedure remains valid when outcomes and controls are highly persistent, possibly close to a unit root.

The modern answer is lag-augmented local projections. The key point is that their justification is not “because simulations look good,” but because lag augmentation changes the regression geometry in a way that restores standard asymptotics under high persistence. This is the central message of José Luis Montiel Olea and Mikkel Plagborg-Møller (2021, Econometrica).

What follows is a time-series-only, pedagogical explanation—together with a link to the older “lag-augmented VAR” logic (Toda–Yamamoto; Dolado–Lütkepohl), which is conceptually the same stabilization device applied in a different context.

1. The local projection object and where persistence enters

A local projection at horizon $h$ estimates

$y_{t+h}=\beta_h x_t+\Gamma_h’ z_t+\varepsilon_{t,h},$

where $x_t$ is the shock (or treatment) and $z_t$ contains controls, usually lags of $y$ and $x$ and other predetermined covariates. This is the Jordà (2005) framework: impulse responses are obtained from a sequence of direct regressions rather than iterating a fully specified dynamic system.

Persistence matters for inference because, when $y_t$ (and potentially $x_t$ and elements of $z_t$ ) are close to unit-root, the usual normal approximation for the t-statistic at long horizons can fail non-uniformly as persistence approaches 1. The “stationarize everything” reflex is an attempt to avoid that inferential fragility, but it changes the estimand and is often unnecessary once one uses an inference approach that is valid under persistence.

2. The key regression-geometry step (FWL): what lag augmentation really changes

Write the LP coefficient using Frisch–Waugh–Lovell residualization. Let $\tilde x_t$ be the residual from projecting $x_t$ on the controls $z_t$ , and $\tilde y_{t+h}$ the residual from projecting $y_{t+h}$ on the same controls. Then

$\beta_h=\frac{\mathbb{E}(\tilde x_t \tilde y_{t+h})}{\mathbb{E}(\tilde x_t^2)}.$

This identity is the conceptual doorway: inference is governed by the behavior of the effective regressor $\tilde x_t$ . Lag augmentation is a way to design $z_t$ so that $\tilde x_t$ behaves like an “innovation-type” object rather than a persistent state variable.

3. The AR(1) thought experiment: why one extra lag makes the regressor innovation-like

Consider the canonical high-persistence data-generating process:

$y_t=\rho y_{t-1}+u_t, \qquad \rho\in(-1,1]\ \text{possibly close to }1.$

Iterating forward, $y_{t+h}=\rho^h y_t+\sum_{j=1}^h \rho^{h-j}u_{t+j}.$

If you use a persistent regressor (like $y_t$ , or a shock proxy tightly related to the state of the system), then as $\rho\to 1$ the regression uses near-integrated variation and standard approximations can become unstable.

Now look at what happens if the control set includes the extra lag $y_{t-1}$ . Consider the residual of $y_t$ after projecting it on $y_{t-1}$ :

$\tilde y_t = y_t – \pi y_{t-1}, \qquad \pi = \frac{\operatorname{Cov}(y_t,y_{t-1})}{\operatorname{Var}(y_{t-1})}.$

In an AR(1), $\pi$ is essentially $\rho$ , so

$\tilde y_t \approx y_t-\rho y_{t-1}=u_t.$

This is the central intuition: adding one extra lag pushes the identifying variation from the persistent level y_t to its innovation u_t. Innovations remain well-behaved even if the level is unit-root. Montiel Olea and Plagborg-Møller formalize this idea in a general framework and show that, with lag augmentation, the t-statistic for LP coefficients admits a standard normal approximation uniformly over stationary and non-stationary environments.

4. Why this is not “just simulations”: the score becomes martingale-like

A second piece of intuition concerns standard errors. LP residuals $\varepsilon_{t,h}$ ,h are generally serially correlated because horizons overlap. A common belief is therefore “you must use HAC.” The Econometrica contribution emphasizes that for inference, what matters is the dependence structure of the score, i.e., (effective regressor) × (residual), not the serial correlation of the residual alone.

In the AR(1) example, the $h$ -step forecast error is

$\xi_{t,h}=\sum_{j=1}^h \rho^{h-j}u_{t+j}.$

With lag augmentation, the effective regressor behaves like $u_t$ . Hence the score behaves like

$s_{t,h}\approx u_t\,\xi_{t,h}=u_t\sum_{j=1}^h \rho^{h-j}u_{t+j}.$

Under the paper’s innovation/exogeneity condition (informally: $u_t$ is mean-independent of future innovations), $s_{t,h}$ has a martingale-difference flavor, which allows a martingale CLT and yields standard inference with appropriately constructed robust standard errors. This is a theory result about asymptotics under high persistence, not a numerical artifact.

5. Two time-series variants: “add one lag of y” vs. “add one lag of the full control vector”

This distinction is practically important and often conflated.

5.1 Augmenting only y-lags (minimal, targeted)

Suppose your baseline LP controls include $p$ lags of $y$ and $x$ . A “y-only” lag augmentation adds one extra lag of $y$ (so you control for $y_{t-1},\dots,y_{t-p-1}$ ) but leaves the rest of the control vector unchanged.

Conceptually, this targets the most direct persistence channel: it strengthens the projection space so that the part of the regressor correlated with the persistent state of $y$ gets stripped away more effectively. In settings where $x_t$ is already close to an innovation (or is externally identified) and the main inferential concern is the persistence of $y$ , this can be enough.

5.2 Augmenting the full control vector (more conservative, system-wide)

A “full-vector” lag augmentation adds one extra lag to all lagged controls (all predetermined covariates included in $z_t$ ). In time series, this means that whatever you treated as predictive information for $y_{t+h}$ at baseline, you include one additional lag of that entire information set.

Conceptually, this is more conservative because it changes both residualizations: it changes the effective regressor $\tilde x_t$ and the effective dependent variable $\tilde y_{t+h}$ by conditioning on a richer predictive history. This is particularly relevant when multiple covariates in $z_t$ zt are persistent and correlated with $x_t$ . The additional lags help ensure that the identifying variation in $x_t$ is “new information at $t$ ” rather than lingering low-frequency components forecastable from $t-1$ and earlier.

Both variants share the same fundamental logic: over-control the short-run dynamics so the object used for inference behaves like an innovation. The difference is whether you treat persistence as primarily a feature of the outcome or as a feature of the entire conditioning information set.

6. Why the connection to lag-augmented VAR inference is natural

The same philosophy is well known from the “lag-augmented VAR” testing literature. Hiro Y. Toda and Taku Yamamoto (1995) show that one can estimate a VAR in levels with extra lags and conduct Wald tests on the original lag coefficients while retaining standard asymptotics even when variables may be integrated or cointegrated. Similarly, Juan J. Dolado and Helmut Lütkepohl (1996) propose fitting an augmented VAR and testing restrictions on the non-augmented lags to recover chi-square limiting behavior.

This is exactly the same stabilization idea as lag-augmented LPs: add redundant dynamics to protect inference from uncertainty about persistence. In VAR testing, the object is a Wald statistic; in LPs, the object is a horizon-specific coefficient and its t-statistic. The reason both work is that the extra lags reallocate variation so that the statistic’s asymptotic approximation becomes stable in environments where naive approximations can be fragile.

7. Practical implication for blog-level guidance

If a referee says “stationarize your variables,” the constructive econometric translation is: “please address inference under persistence.” Lag-augmented local projections address that concern directly, without mechanically changing the estimand via differencing. This does not remove the need to argue identification, choose controls thoughtfully, or justify horizons; it simply clarifies that stationarity is not the correct precondition for LP estimation, and that persistence concerns are handled at the level of inference by a principled device.

References

Jordà, Ò. (2005). “Estimation and Inference of Impulse Responses by Local Projections.” American Economic Review, 95(1), 161–182.

Montiel Olea, J. L., and Plagborg-Møller, M. (2021). “Local Projection Inference Is Simpler and More Robust Than You Think.” Econometrica, 89(4), 1789–1823.

Toda, H. Y., and Yamamoto, T. (1995). “Statistical Inference in Vector Autoregressions with Possibly Integrated Processes.” Journal of Econometrics, 66(1–2), 225–250.

Dolado, J. J., and Lütkepohl, H. (1996). “Making Wald Tests Work for Cointegrated VAR Systems.” Econometric Reviews, 15(4), 369–386.