A recurring comment in referee reports is that local projections “require stationarity,” or that the researcher must “stationarize” (difference/detrend) the variables before computing impulse responses. In a time-series LP setting, that objection is typically misdirected. The relevant issue is not whether the series are stationary as a prerequisite for estimation, but whether the inference procedure remains valid when outcomes and controls are highly persistent, possibly close to a unit root.
The modern answer is lag-augmented local projections. The key point is that their justification is not “because simulations look good,” but because lag augmentation changes the regression geometry in a way that restores standard asymptotics under high persistence. This is the central message of José Luis Montiel Olea and Mikkel Plagborg-Møller (2021, Econometrica).
What follows is a time-series-only, pedagogical explanation—together with a link to the older “lag-augmented VAR” logic (Toda–Yamamoto; Dolado–Lütkepohl), which is conceptually the same stabilization device applied in a different context.
1. The local projection object and where persistence enters
A local projection at horizon estimates
where is the shock (or treatment) and contains controls, usually lags of and and other predetermined covariates. This is the Jordà (2005) framework: impulse responses are obtained from a sequence of direct regressions rather than iterating a fully specified dynamic system.
Persistence matters for inference because, when (and potentially and elements of ) are close to unit-root, the usual normal approximation for the t-statistic at long horizons can fail non-uniformly as persistence approaches 1. The “stationarize everything” reflex is an attempt to avoid that inferential fragility, but it changes the estimand and is often unnecessary once one uses an inference approach that is valid under persistence.
2. The key regression-geometry step (FWL): what lag augmentation really changes
Write the LP coefficient using Frisch–Waugh–Lovell residualization. Let be the residual from projecting on the controls , and the residual from projecting on the same controls. Then
This identity is the conceptual doorway: inference is governed by the behavior of the effective regressor . Lag augmentation is a way to design so that behaves like an “innovation-type” object rather than a persistent state variable.
3. The AR(1) thought experiment: why one extra lag makes the regressor innovation-like
Consider the canonical high-persistence data-generating process:
Iterating forward,
If you use a persistent regressor (like , or a shock proxy tightly related to the state of the system), then as the regression uses near-integrated variation and standard approximations can become unstable.
Now look at what happens if the control set includes the extra lag . Consider the residual of after projecting it on :
In an AR(1), is essentially , so
This is the central intuition: adding one extra lag pushes the identifying variation from the persistent level yt to its innovation ut. Innovations remain well-behaved even if the level is unit-root. Montiel Olea and Plagborg-Møller formalize this idea in a general framework and show that, with lag augmentation, the t-statistic for LP coefficients admits a standard normal approximation uniformly over stationary and non-stationary environments.
4. Why this is not “just simulations”: the score becomes martingale-like
A second piece of intuition concerns standard errors. LP residuals ,h are generally serially correlated because horizons overlap. A common belief is therefore “you must use HAC.” The Econometrica contribution emphasizes that for inference, what matters is the dependence structure of the score, i.e., (effective regressor) × (residual), not the serial correlation of the residual alone.
In the AR(1) example, the -step forecast error is
With lag augmentation, the effective regressor behaves like . Hence the score behaves like
Under the paper’s innovation/exogeneity condition (informally: is mean-independent of future innovations), has a martingale-difference flavor, which allows a martingale CLT and yields standard inference with appropriately constructed robust standard errors. This is a theory result about asymptotics under high persistence, not a numerical artifact.
5. Two time-series variants: “add one lag of y” vs. “add one lag of the full control vector”
This distinction is practically important and often conflated.
5.1 Augmenting only y-lags (minimal, targeted)
Suppose your baseline LP controls include lags of and . A “y-only” lag augmentation adds one extra lag of (so you control for ) but leaves the rest of the control vector unchanged.
Conceptually, this targets the most direct persistence channel: it strengthens the projection space so that the part of the regressor correlated with the persistent state of gets stripped away more effectively. In settings where is already close to an innovation (or is externally identified) and the main inferential concern is the persistence of , this can be enough.
5.2 Augmenting the full control vector (more conservative, system-wide)
A “full-vector” lag augmentation adds one extra lag to all lagged controls (all predetermined covariates included in ). In time series, this means that whatever you treated as predictive information for at baseline, you include one additional lag of that entire information set.
Conceptually, this is more conservative because it changes both residualizations: it changes the effective regressor and the effective dependent variable by conditioning on a richer predictive history. This is particularly relevant when multiple covariates in zt are persistent and correlated with . The additional lags help ensure that the identifying variation in is “new information at ” rather than lingering low-frequency components forecastable from and earlier.
Both variants share the same fundamental logic: over-control the short-run dynamics so the object used for inference behaves like an innovation. The difference is whether you treat persistence as primarily a feature of the outcome or as a feature of the entire conditioning information set.
6. Why the connection to lag-augmented VAR inference is natural
The same philosophy is well known from the “lag-augmented VAR” testing literature. Hiro Y. Toda and Taku Yamamoto (1995) show that one can estimate a VAR in levels with extra lags and conduct Wald tests on the original lag coefficients while retaining standard asymptotics even when variables may be integrated or cointegrated. Similarly, Juan J. Dolado and Helmut Lütkepohl (1996) propose fitting an augmented VAR and testing restrictions on the non-augmented lags to recover chi-square limiting behavior.
This is exactly the same stabilization idea as lag-augmented LPs: add redundant dynamics to protect inference from uncertainty about persistence. In VAR testing, the object is a Wald statistic; in LPs, the object is a horizon-specific coefficient and its t-statistic. The reason both work is that the extra lags reallocate variation so that the statistic’s asymptotic approximation becomes stable in environments where naive approximations can be fragile.
7. Practical implication for blog-level guidance
If a referee says “stationarize your variables,” the constructive econometric translation is: “please address inference under persistence.” Lag-augmented local projections address that concern directly, without mechanically changing the estimand via differencing. This does not remove the need to argue identification, choose controls thoughtfully, or justify horizons; it simply clarifies that stationarity is not the correct precondition for LP estimation, and that persistence concerns are handled at the level of inference by a principled device.
References
Jordà, Ò. (2005). “Estimation and Inference of Impulse Responses by Local Projections.” American Economic Review, 95(1), 161–182.
Montiel Olea, J. L., and Plagborg-Møller, M. (2021). “Local Projection Inference Is Simpler and More Robust Than You Think.” Econometrica, 89(4), 1789–1823.
Toda, H. Y., and Yamamoto, T. (1995). “Statistical Inference in Vector Autoregressions with Possibly Integrated Processes.” Journal of Econometrics, 66(1–2), 225–250.
Dolado, J. J., and Lütkepohl, H. (1996). “Making Wald Tests Work for Cointegrated VAR Systems.” Econometric Reviews, 15(4), 369–386.