A common problem in applied microeconometrics arises when the dependent variable is observed at a very disaggregated level, while the main explanatory variable varies at a more aggregated level. This is the Moulton problem. It matters especially in micro trade regressions that combine firm-level customs data with country-level, destination-level, or destination-month-level shocks.
The issue is not that micro data are problematic. On the contrary, highly disaggregated data are valuable because they allow the econometrician to control for firm-product heterogeneity, product composition, destination characteristics, and time effects. The issue arises when the key variable of interest is common to many micro observations.
Consider a setting where exports are observed at the firm-product-destination month level, while the political relationship index varies only at the destination-month level. In that case, all firm-product observations exporting to the same destination in the same month receive the same geopolitical value.
1. A baseline micro trade regression
Suppose that the estimating equation is:
Here, Exportsikcmt denotes exports by firm i, product k, destination country c, and month-year mt. The variable PRIc,mt−1 is a lagged bilateral political relationship index. The variable τkc,t−1 captures tariffs at the product-destination-year level. The fixed effects absorb firm-product heterogeneity, monthly seasonality, common yearly shocks, and destination-specific time-invariant characteristics.
The key feature is that the political relationship index does not vary at the firm-product level. It varies at the destination-month level:
This equality is the source of the Moulton concern. If there are thousands of firm-product observations for the same destination in the same month, they all share the same value of the geopolitical variable. These observations are not independent pieces of information about the effect of the political relationship index. They are repeated micro observations attached to the same aggregate shock.
2. Why the problem appears in the coefficient formula
After residualizing the dependent variable and the regressor with respect to the controls and fixed effects, the coefficient can be written schematically as:
Because the residualized political relationship index is common to all firm-product observations within the same destination-month cell, the numerator can be rearranged as:
This expression shows that the relevant residual object for inference is not each individual firm-product residual in isolation. It is the sum of residuals inside each destination-month cell:
Therefore, the uncertainty around the coefficient on the political relationship index depends on the variance of these destination-month residual sums. If residuals are correlated within a destination-month cell, the standard error computed under the assumption of independent micro observations may be too small.
3. The Moulton variance inflation factor
The classical Moulton approximation expresses the variance inflation factor as:
In this formula, ρx is the within-cluster correlation of the regressor, ρu is the within-cluster correlation of the residuals, and n̄ is the average number of micro observations per aggregate cluster.
In the setting considered here, the regressor PRIcmt is constant within each destination-month cell. Therefore:
The formula becomes:
The standard-error inflation factor is:
This formula explains why the Moulton problem can be severe. Even a small residual correlation can have a large effect when the number of micro observations per aggregate cell is large.
4. How large can the problem be?
In a firm-product-destination-month trade dataset, the number of micro observations can be very large. Suppose that a regression contains about 17.5 million observations and that the geopolitical variable is observed for 12 destinations over 84 months. The number of destination-month geopolitical cells is then:
The average number of micro observations attached to each destination-month geopolitical value is:
The Moulton factor is therefore:
The following table illustrates the sensitivity of the standard-error inflation factor to different values of the residual correlation within destination-month cells.
| Residual correlation | Variance inflation factor | Standard-error inflation |
|---|---|---|
| 0.00001 | 1.17 | 1.08 |
| 0.00005 | 1.87 | 1.37 |
| 0.00010 | 2.74 | 1.65 |
| 0.00050 | 9.68 | 3.11 |
| 0.00100 | 18.36 | 4.29 |
If the residual correlation is only 0.00001, the standard error increases by about 8 percent. That is small. But if the residual correlation is 0.00010, the standard error increases by about 65 percent. If it reaches 0.001, the standard error is more than 4 times larger.
5. Why HS6 data may mitigate the problem
There is an important counterargument. When the data are observed at the firm-HS6-destination-month level, the residual correlation within a destination-month cell may be very small. This is because observations in the same destination-month cell are highly heterogeneous. They include different firms, different products, different sectors, different contracts, and different market positions.
A textile exporter and an electronics exporter selling to the same destination in the same month may have very different residual shocks. After firm-product fixed effects, time effects, destination effects, and tariff controls, much of the remaining variation may be idiosyncratic. This pushes the residual correlation ρu toward 0.
A useful way to see this is to decompose the residual as:
where acmt is the common destination-month residual component and εikcmt is the idiosyncratic firm-product residual component. The within-destination-month residual correlation is approximately:
If the idiosyncratic firm-product component dominates the common destination-month component, then:
This is why very disaggregated HS6 data can mitigate the Moulton problem. The more heterogeneous the micro observations are, the smaller the common residual component may be.
6. Why the issue cannot be dismissed
The problem is that “small” is not enough. The residual correlation must be small relative to the average number of observations per aggregate cell. In the example above, the Moulton factor depends on:
Hence, even a correlation that appears tiny in usual statistical terms can produce meaningful standard-error inflation. The correct question is not simply whether residual correlation is small. The correct question is whether it is small enough to offset the large number of micro observations attached to the same geopolitical shock.
There are also economic reasons why the residual correlation may not be exactly 0. A destination-month shock could affect many exporters at the same time. Examples include destination-specific demand shocks, port congestion, customs delays, exchange-rate movements, changes in inspection intensity, diplomatic news not fully captured by the index, or common expectations about the destination market.
These shocks can generate:
for different firm-product observations within the same destination-month cell.
7. Fixed effects and clustering solve different problems
A common misunderstanding is to think that rich fixed effects eliminate the need for appropriate clustering. They do not. Fixed effects address omitted-variable bias by absorbing systematic components of the outcome. Clustering addresses inference by allowing residuals to be correlated within a chosen group.
For example, firm-product fixed effects absorb permanent firm-product heterogeneity. Month and year fixed effects absorb common time patterns. Destination fixed effects absorb time-invariant destination characteristics. These controls improve identification. But they do not mechanically create independent variation in PRI across firm-product observations inside the same destination-month cell.
Therefore, even with rich fixed effects, the level of variation of the treatment remains crucial for inference.
8. The practical solution
The empirical solution is not necessarily to replace firm-product clustering mechanically with destination-month clustering. The 2 clustering dimensions capture different sources of dependence. Firm-product clustering allows serial correlation within a firm-product export relationship. Destination-month clustering allows common shocks across all firm-products exposed to the same geopolitical variable in the same destination-month cell.
For this reason, the most informative robustness exercise is to compare 3 specifications: clustering at the firm-product level, clustering at the destination-month level, and 2-way clustering at both levels.
* Create the destination-month cluster
capture drop imfcode_period
egen imfcode_period = group(imfcode period)
* 1. Baseline: firm-product clustering
reghdfe l_exports Llpri tariff, ///
absorb(id_hs6 month year imfcode) ///
vce(cluster id_hs6)
scalar se_firmprod = _se[Llpri]
scalar t_firmprod = _b[Llpri] / _se[Llpri]
* 2. Destination-month clustering
reghdfe l_exports Llpri tariff, ///
absorb(id_hs6 month year imfcode) ///
vce(cluster imfcode_period)
scalar se_destmonth = _se[Llpri]
scalar t_destmonth = _b[Llpri] / _se[Llpri]
* 3. Two-way clustering
reghdfe l_exports Llpri tariff, ///
absorb(id_hs6 month year imfcode) ///
vce(cluster id_hs6 imfcode_period)
scalar se_twoway = _se[Llpri]
scalar t_twoway = _b[Llpri] / _se[Llpri]
display "Destination-month / firm-product SE ratio = " ///
se_destmonth / se_firmprod
display "Two-way / firm-product SE ratio = " ///
se_twoway / se_firmprod
The interpretation is direct. If:
then the Moulton distortion is empirically small. That would support the idea that the HS6 disaggregation and rich fixed effects leave little residual correlation within destination-month cells.
But if:
then clustering only at the firm-product level may understate the uncertainty around the geopolitical coefficient.
The 2-way clustered standard error is especially informative because it allows both forms of dependence: persistence within firm-product relationships and common shocks within destination-month cells.
9. Computing the exact average cell size
The numerical illustration above uses the approximate number of destination-month cells. In practice, the exact average cell size should be computed from the estimation sample. This matters because lagging the geopolitical variable, missing observations, and sample restrictions may change the number of non-empty destination-month cells.
* Create the destination-month identifier
capture drop imfcode_period
egen imfcode_period = group(imfcode period)
* Run the exact regression
reghdfe l_exports Llpri tariff, ///
absorb(id_hs6 month year imfcode) ///
vce(cluster id_hs6)
* Count observations in the estimation sample
count if e(sample)
scalar N = r(N)
* Count non-empty destination-month cells
capture drop tag_ct
egen tag_ct = tag(imfcode_period) if e(sample)
count if tag_ct == 1
scalar G = r(N)
scalar nbar = N / G
display "N = " N
display "Destination-month cells = " G
display "Average observations per destination-month cell = " nbar
10. Conclusion
The Moulton problem is not only a textbook issue. It appears naturally in modern applied work combining micro data with aggregate shocks. In micro trade regressions, a geopolitical variable may vary at the destination-month level while exports are observed at the firm-product-destination-month level. This creates a mismatch between the apparent number of observations and the effective level of identifying variation.
Highly disaggregated HS6 data can mitigate the problem because residual correlation within destination-month cells may be very small. However, when each geopolitical realization is replicated across many thousands of micro observations, even a tiny common residual component can matter for inference.
The appropriate conclusion is therefore balanced. The Moulton problem is structurally relevant when the regressor varies at a more aggregated level than the dependent variable. Its empirical magnitude, however, depends on the residual correlation within the aggregate cells. The best practice is to report a robustness check clustering at the level of variation of the key regressor and, preferably, a 2-way clustering specification that also preserves the firm-product dependence structure.
References
- Abadie, A., Athey, S., Imbens, G. W., and Wooldridge, J. M. (2023). “When Should You Adjust Standard Errors for Clustering?” Quarterly Journal of Economics, 138(1), 1–35.
- Angrist, J. D., and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
- Bertrand, M., Duflo, E., and Mullainathan, S. (2004). “How Much Should We Trust Differences-in-Differences Estimates?” Quarterly Journal of Economics, 119(1), 249–275.
- Cameron, A. C., Gelbach, J. B., and Miller, D. L. (2011). “Robust Inference With Multiway Clustering.” Journal of Business & Economic Statistics, 29(2), 238–249.
- Cameron, A. C., and Miller, D. L. (2015). “A Practitioner’s Guide to Cluster-Robust Inference.” Journal of Human Resources, 50(2), 317–372.
- Moulton, B. R. (1986). “Random Group Effects and the Precision of Regression Estimates.” Journal of Econometrics, 32(3), 385–397.
- Moulton, B. R. (1990). “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.” Review of Economics and Statistics, 72(2), 334–338.
- Saadaoui, J., Strauss-Kahn, V., and Creel, J. (2026). “How Geopolitics Influence Chinese Firms’ Exports: Firm-Level Evidence of ‘Friendtrading’ under Extreme Events.” Working paper.