Can Local Projections Have Short-Run Restrictions?

Before diving into the blog, allow me to share that it is the first time I wrote an EconMacro blog on a plane. A lot of people seems to puzzled by causal identification with local projections. This is why I write this blog today.

A common reaction to local projections is: “I understand how to impose short-run restrictions in a VAR, but how can I impose them in an LP?” The answer is simple but important: a short-run restriction is not a restriction on the VAR estimator. It is a restriction on the structural shock.

Main takeaway. Once the shock has been identified—for example by a recursive Cholesky ordering—the same shock can be used in a VAR, a Bayesian VAR, or a local projection. The VAR and the LP differ in how they estimate dynamics after impact. They need not differ in the short-run identification of the shock.

This distinction matters in empirical macro. Suppose a paper identifies geopolitical risk shocks in a panel VAR by ordering geopolitical risk first. Or suppose it separates geopolitical acts from geopolitical threats by ordering acts before threats. It may look as if this logic belongs only to VARs. It does not. The same recursive restriction can be used in a local projection, either by constructing the Cholesky shock first or by including the right contemporaneous controls.

1. Where the restriction lives

Start with a one-lag reduced-form VAR:

\[ y_t = A y_{t-1} + u_t . \]

Here \(y_t\) is an \(n \times 1\) vector of endogenous variables, \(A\) is an \(n \times n\) matrix of autoregressive coefficients, and \(u_t\) is the vector of reduced-form residuals. These residuals are generally correlated across equations:

\[ \mathbb{E}(u_t u_t’) = \Sigma_u . \]

The reduced-form residuals are not structural shocks. To identify structural shocks, write

\[ u_t = P \varepsilon_t, \qquad \mathbb{E}(\varepsilon_t \varepsilon_t’) = I_n . \]

The matrix \(P\) is the contemporaneous impact matrix. Since \(u_t = P\varepsilon_t\), it must satisfy

\[ \Sigma_u = \mathbb{E}(u_tu_t’) = P, \qquad \mathbb{E}(\varepsilon_t\varepsilon_t’)P’ = PP’ . \]

A recursive Cholesky identification chooses \(P\) to be lower triangular under a chosen ordering. The ordering determines which variables can move on impact in response to each structural shock.

The short-run restriction is in \(P\), not in \(A\).
The matrix \(P\) tells us what happens on impact. The matrix \(A\) tells us how the system propagates after impact.

If \(e_j\) is the unit vector selecting structural shock \(j\), then the impact effect of that shock is the \(j\)-th column of \(P\):

\[ IRF_0(j) = Pe_j . \]

The VAR response at horizon \(h\) is then

\[ IRF_h^{VAR}(j) = A^h Pe_j . \]

This formula separates identification and propagation. The term \(Pe_j\) is the short-run identified impact. The term \(A^h\) is the VAR’s dynamic propagation.

2. What the LP does differently

A local projection does not impose the VAR recursion. At each horizon \(h\), it estimates a regression such as

\[ y_{t+h} = \alpha_h + B_h y_{t-1} + \theta_{h,j}\varepsilon_{j,t} + v_{t+h,h}. \]

The LP coefficient \(\theta_{h,j}\) is the estimated response at horizon \(h\) to structural shock \(j\). The LP does not require that

\[ \theta_{h,j}=A^hPe_j . \]

Instead, it estimates \(\theta_{h,j}\) directly. But nothing prevents the LP from using the same recursively identified shock \(\varepsilon_{j,t}\) that the VAR uses. This is the key point.

VAR: identify the shock with \(P\), then propagate it with \(A^h\).
LP: identify the same shock with \(P\), then estimate each horizon directly.

Thus, an LP can have the same short-run restrictions as a VAR. It simply does not impose the same dynamic restrictions across horizons.

3. A two-variable example

Consider two variables, \(x_t\) and \(z_t\), ordered as \(x\) first and \(z\) second. The reduced-form residual vector is

\[ u_t = \begin{bmatrix} u_t^x \\ u_t^z \end{bmatrix}. \]

The Cholesky system is

\[ \begin{bmatrix} u_t^x \\ u_t^z \end{bmatrix} = \begin{bmatrix} p_{xx} & 0 \\ p_{zx} & p_{zz} \end{bmatrix} \begin{bmatrix} \varepsilon_t^x \\ \varepsilon_t^z \end{bmatrix}. \]

Expanding the two equations gives

\[ \begin{aligned} u_t^x &= p_{xx}\varepsilon_t^x, \\ u_t^z &= p_{zx}\varepsilon_t^x + p_{zz}\varepsilon_t^z . \end{aligned} \]

The shock to the first variable can move both variables on impact. The shock to the second variable cannot move the first variable on impact. That is the short-run restriction.

Solving for the second structural shock gives

\[ \varepsilon_t^z = \frac{ u_t^z – \left(p_{zx}/p_{xx}\right)u_t^x }{ p_{zz} }. \]

Equivalently, since \(p_{zx}/p_{xx}\) is the regression coefficient of \(u_t^z\) on \(u_t^x\), the second shock is

\[ \varepsilon_t^z = \frac{ u_t^z – \operatorname{Proj}(u_t^z \mid u_t^x) }{ p_{zz} }. \]

In words, the shock to the second variable is the innovation in the second variable after removing the part explained by the contemporaneous innovation in the first variable.

This is not a VAR-only operation. It is residualization. And residualization can be done before an LP just as easily as before a VAR.

4. Two equivalent ways to use the restriction in an LP

Approach 1: Construct the Cholesky shock and put it in the LP

First estimate the reduced-form residuals \(u_t\). Then compute the Cholesky factor \(P\). Then recover structural shocks:

\[ \varepsilon_t = P^{-1}u_t . \]

Use the relevant component \(\varepsilon_{j,t}\) in the local projection:

\[ y_{t+h} = \alpha_h + B_h y_{t-1} + \theta_{h,j}\varepsilon_{j,t} + v_{t+h,h}. \]

This is the cleanest conceptual implementation: the VAR and LP are fed the same structural shock.

Approach 2: Include the earlier-ordered contemporaneous variables as controls

Suppose the shock variable is ordered after some variables collected in \(w_t\). Then estimate:

\[ y_{t+h} = \alpha_h + \theta_h x_t + \delta_h’ w_t + \Gamma_h’X_{t-1} + v_{t+h,h}. \]

Here \(x_t\) is the shock variable, \(w_t\) contains the contemporaneous variables ordered before it, and \(X_{t-1}\) contains lag controls.

The coefficient on \(x_t\) is estimated using only the component of \(x_t\) that is orthogonal to \(w_t\) and the lags. That is exactly the recursive residualization behind the Cholesky shock.

Important. Do not include contemporaneous variables ordered after the shock variable if the goal is to reproduce the recursive short-run restriction. Those variables are allowed to respond to the shock on impact. Controlling for them would generally change the object being estimated.

5. Why Frisch–Waugh–Lovell is the bridge

The Frisch–Waugh–Lovell theorem says that the coefficient on a regressor in a multiple regression is the same coefficient one obtains after residualizing both the outcome and that regressor with respect to the controls.

Consider the regression

\[ y = X_1\beta_1 + X_2\beta_2 + e . \]

Define the residual-maker matrix for the controls \(X_2\):

\[ M_2 = I – X_2(X_2’X_2)^{-1}X_{2}’ . \]

Then the coefficient on \(X_1\) in the full regression is

\[ \widehat{\beta}_1 = (X_1’M_2X_1)^{-1}X_1’M_2y . \]

This is the coefficient from regressing residualized \(y\) on residualized \(X_1\).

Apply this to the LP:

\[ y_{t+h} = \alpha_h + \theta_h x_t + \delta_h’w_t + \Gamma_h’X_{t-1} + v_{t+h,h}. \]

By Frisch–Waugh–Lovell, the coefficient \(\theta_h\) is estimated from the residualized component of \(x_t\):

\[ \widetilde{x}_t = x_t – \operatorname{Proj} \left( x_t \mid w_t, X_{t-1} \right). \]

So “include contemporaneous controls” is not just a casual regression adjustment. Algebraically, it means “use only the residualized innovation in the shock variable.” When the controls correspond to the variables ordered before the shock, this residualized innovation is the LP version of the Cholesky shock.

6. Application: geopolitical acts versus threats

Suppose the empirical model separates geopolitical risk into two components:

GPA: geopolitical acts, such as the outbreak or escalation of conflict.
GPT: geopolitical threats, meaning concerns, risks, and uncertainty about adverse geopolitical events.

Now order

\[ GPA_t \quad \text{first}, \qquad GPT_t \quad \text{second}. \]

This allows acts to raise threats on impact. But it identifies a threat shock as the component of threats not explained by contemporaneous acts.

The Cholesky system for the first two residuals is

\[ \begin{bmatrix} u_t^{GPA} \\ u_t^{GPT} \end{bmatrix} = \begin{bmatrix} p_{aa} & 0 \\ p_{qa} & p_{qq} \end{bmatrix} \begin{bmatrix} \varepsilon_t^{GPA} \\ \varepsilon_t^{GPT} \end{bmatrix}. \]

The threats shock is

\[ \varepsilon_t^{GPT} = \frac{ u_t^{GPT} – \operatorname{Proj} \left( u_t^{GPT} \mid u_t^{GPA} \right) }{ p_{qq} }. \]

An LP can implement this in either of two ways. It can use the constructed shock \(\varepsilon_t^{GPT}\). Or it can estimate the LP with contemporaneous \(GPA_t\) as a control:

\[ y_{t+h} = \alpha_h + \theta_h^{GPT}GPT_t + \delta_h GPA_t + \Gamma_h’X_{t-1} + v_{t+h,h}. \]

By Frisch–Waugh–Lovell, the coefficient on \(GPT_t\) is estimated using the part of threats orthogonal to contemporaneous acts and lagged controls. That is precisely the recursive restriction.

7. Application: global versus country-specific geopolitical risk

Now suppose the model separates geopolitical risk into:

Global GPR: worldwide geopolitical risk.
Country GPR: geopolitical risk exposure for a specific country.

Order

\[ GPR_t^{Global} \quad \text{first}, \qquad GPR_{i,t}^{Country} \quad \text{second}. \]

This lets a global shock move country-level GPR on impact. But it identifies a country-specific shock as the part of country GPR not explained by contemporaneous global GPR.

The country-specific shock is

\[ \varepsilon_{i,t}^{Country} = \frac{ u_{i,t}^{Country} – \operatorname{Proj} \left( u_{i,t}^{Country} \mid u_t^{Global} \right) }{ p_{cc} }. \]

The LP version can be written as

\[ y_{i,t+h} = \alpha_{i,h} + \theta_h^{Country}GPR_{i,t}^{Country} + \delta_h GPR_t^{Global} + \Gamma_h’X_{i,t-1} + v_{i,t+h,h}. \]

Including contemporaneous global GPR means the coefficient on country GPR is identified from the component orthogonal to the global shock. That is the same short-run restriction used by the recursive VAR.

8. What is equivalent, and what is not

It is useful to be precise about the word “equivalent.” The VAR and the LP are equivalent in the shock-identification step if they use the same residualization, ordering, controls, and normalization.

They are not necessarily numerically identical in finite samples. The VAR estimates a transition matrix and uses it to impose cross-horizon structure:

\[ IRF_h^{VAR}(j) = A^hPe_j . \]

The LP estimates each horizon separately:

\[ IRF_h^{LP}(j) = \theta_{h,j}. \]

In population, under a correctly specified VAR data-generating process and with the same shock, the LP coefficient recovers the same impulse response. In finite samples, estimates can differ because the estimators impose different structure. The VAR is more parametric across horizons. The LP is more direct and flexible.

Normalization matters. If the LP uses the normalized structural shock \(\varepsilon_{j,t}\), its coefficient is the response to a one-standard-deviation structural shock. If the LP uses an unnormalized residualized innovation, the coefficient has the same identifying variation but a different scale.

9. The bottom line

The misconception is that Cholesky restrictions are “VAR restrictions.” More accurately, they are restrictions on the contemporaneous relationship between reduced-form residuals and structural shocks:

\[ u_t = P\varepsilon_t . \]

Once the structural shock has been defined, an econometrician has choices. A VAR propagates the shock using the estimated transition matrix. A local projection estimates the effect of the same shock directly at each horizon.

Local projections can have short-run restrictions because the restrictions define the shock, not the estimator.

This is the clean way to explain the equivalence. In the VAR, the short-run restriction is imposed through the impact matrix \(P\). In the LP, the same restriction is imposed either by using the Cholesky shock directly or by controlling for the contemporaneous variables ordered before the shock variable. Frisch–Waugh–Lovell tells us why the latter works: controlling for those variables is algebraically equivalent to using the residualized innovation.

For geopolitical risk applications, this means that the LP analogue of a recursive VAR is straightforward. A threat shock is the threats innovation orthogonal to contemporaneous acts. A country-specific GPR shock is the country GPR innovation orthogonal to contemporaneous global GPR. The VAR and the LP can therefore share the same short-run identification while differing in how they estimate the subsequent dynamic response.

References

Caldara, D., Conlisk, S., Iacoviello, M., and Penn, M. (2026). “Do geopolitical risks raise or lower inflation?” Journal of International Economics, 159, 104188.

Frisch, R., and Waugh, F. V. (1933). “Partial time regressions as compared with individual trends.” Econometrica.

Lovell, M. C. (1963). “Seasonal adjustment of economic time series and multiple regression analysis.” Journal of the American Statistical Association.

Plagborg-Møller, M., and Wolf, C. K. (2021). “Local projections and VARs estimate the same impulse responses.” Econometrica, 89(2), 955–980.

Regression Adjustment for Heterogeneous DiD in Stata: Comparing xthdidregress, lwdid, jwdid, and csdid

Portable Stata Projects: Running the Same Do-File on Any Computer

From Soviet Steel to Japanese Electronics to Chinese Semiconductors: Three Rivalries in the Great Global Transformation

Centering Nonlinear Terms: Interpreting Marginal Effects at the Mean

The 18th International Conference on Statistical Analysis of Textual Data