# Overall Summary — Lu & Yu (2015, AEJ:Applied)
### "Trade Liberalization and Markup Dispersion: Evidence from China's WTO Accession"

## One-paragraph thesis

China's 2001 WTO accession forced industries with pre-WTO tariffs above ~10% to cut to the WTO ceiling, generating sharp cross-industry heterogeneity in trade liberalization. Lu & Yu exploit this by running a difference-in-differences of the form `y_{it} = α_i + β · Tariff_{i,2001} · Post02_t + X'_{it} γ + λ_t + ε_{it}` on an industry-year panel of 154–155 three-digit CIC manufacturing industries over 1998–2005, where the outcome `y_{it}` is the log of the Theil index of within-industry firm markup dispersion. Their headline finding is `β̂ = −0.322 (SE 0.104)`: industries that had higher pre-WTO tariffs — and therefore saw larger mechanical tariff cuts under the WTO ceiling — experienced significant declines in markup dispersion after 2001, consistent with trade liberalization reducing product-market misallocation. They push through a full robustness pipeline (pre-trends visual, Gentzkow pretrend interactions, anticipation test, concurrent-reform controls, Topalova and processing-trader placebos, alternative dispersion measures, 4-digit industry definition, two-period collapse, non-exporters subsample) and the coefficient stays in a tight −0.25 to −0.35 band across every specification.

## ★ The TWFE specification (deep dive) ★

### Equation (1), the workhorse
```
y_{it} = α_i + β · Tariff_{i,2001} · Post02_t + X'_{it} γ + λ_t + ε_{it}
```
- **Unit:** `i` = 3-digit CIC industry (164 manufacturing industries, minus Tobacco and a few with missing covariates, → 154–155 in regressions)
- **Time:** `t` = year, 1998–2005 (8 years)
- **Panel size:** N = 1,232 to 1,235 industry-year observations
- **Outcome `y_{it}`:** log of the Theil index of firm markup dispersion within industry i at year t, computed from firm-level markups via eq. (12) after dropping the top and bottom 2.5% of markups within each (industry, year) cell to kill outliers
- **Regressor of interest:** `Tariff_{i,2001} × Post02_t` where `Tariff_{i,2001}` is the industry's pre-WTO 2001 output tariff in decimal units (e.g. 0.17 for a 17% tariff), and `Post02_t = 1{t ≥ 2002}`
- **Fixed effects:** `α_i` (industry) + `λ_t` (year)
- **Time-varying industry controls `X_{it}`:** Ellison-Glaeser agglomeration index, log mean fixed assets (entry-barrier proxy), log number of firms (entry-barrier proxy)
- **Gentzkow-style pre-period-characteristic interactions:** `SOE share_{i,2001} × Post02_t`, `avg wage per worker_{i,2001} × Post02_t`, `export intensity_{i,2001} × Post02_t` — added in Column 3 of Table 1 to flexibly absorb post-WTO trends generated by pre-WTO differences in the three significant predictors of the 2001 tariff level
- **Standard errors:** clustered at the 3-digit industry level (154+ clusters — well above the BDM 2004 "trust clustered SE above 50 clusters" threshold)

### What β actually measures

Because `Post02_t` flips from 0 to 1 at t = 2002 and `Tariff_{i,2001}` is a time-invariant industry characteristic, `β · Tariff_{i,2001}` is the **predicted post-minus-pre difference in log(Theil)** for industry i, relative to an industry with `Tariff_{i,2001} = 0`. Interpreted as a semi-elasticity: for a typical mid-treatment industry with a 2001 tariff of 17%, the implied post-WTO effect on log(Theil) is `−0.322 × 0.17 ≈ −5.5%`. For a high-tariff industry at the 75th percentile of pre-WTO tariffs (~25%), it's `−0.322 × 0.25 ≈ −8%`.

### The identifying assumption (equation 2)
```
E[ε_{it} | Tariff_{i,2001} · Post02_t, α_i, X_{it}, λ_t] = E[ε_{it} | α_i, X_{it}, λ_t]
```

In words: **conditional on industry FE, year FE, and the controls, the interaction is uncorrelated with the error**, i.e. high-tariff and low-tariff industries would have had the same post-WTO trend in markup dispersion had there been no WTO accession. Footnote 4 notes the `X_{it}` need NOT be exogenous — this is just a standard DiD, not a structural model.

### Why `Tariff_{i,2001}` and not `Tariff_{it}`

This is the most interesting specification choice in the paper. Two reasons:
1. **Anticipation.** The phase-out schedule was announced in early 2002. Annual tariffs `Tariff_{it}` mix "realized" and "expected" liberalization in a messy way. Using the fixed 2001 value captures the total (realized + anticipated) effect in one coefficient. Citation: Liu & Trefler (2011).
2. **Endogeneity of annual tariffs.** `Tariff_{2004}` and `Tariff_{2005}` could plausibly respond to firm behavior during the implementation period (e.g., if an industry reshaped itself, the government could petition for exemptions). The 2001 value is a pre-determined industry characteristic, so it's weakly exogenous conditional on the accession shock.

Appendix Table 1 shows the results are similar (but weaker) using `Tariff_{it}` or 1997–2001 averages or 1997 values. The paper commits to `Tariff_{i,2001}` as the main spec and uses the others as robustness.

## ★ Dose: detailed anatomy ★

### What "dose" means in Lu & Yu

The paper does not use the word "dose" — that's CBS vocabulary. But the continuous regressor `Tariff_{i,2001}` plays exactly the role of a dose:
- **Units:** decimal fraction (0.17 = 17% pre-WTO tariff)
- **Support in the data:** approximately [0, 0.60] at the 3-digit CIC level (see Figure 2 X-axis)
- **Distribution:** heavily concentrated in the 0.05–0.25 range, with a long right tail of high-protection industries (mostly agriculture-adjacent manufacturing like sugar, tobacco, some chemicals)
- **Sample mean:** ~0.17 (17% unweighted average) in 2001, per Figure 1
- **Sample median:** used to define the "high-tariff" vs "low-tariff" binary split in Figure 4 — the visual identification

### The "dose → cut" mechanism

Figure 2 is the clincher for the dose interpretation. It plots 2001 tariff (X) against 2001–2005 change in tariff (Y) across 3-digit industries. The relationship is strongly linear with a positive slope of roughly 0.4: **an industry with a 10 percentage point higher 2001 tariff experienced, on average, a 4 percentage point larger mechanical cut after WTO.**

This is because:
- WTO ceilings are roughly uniform across manufacturing products (8.9% target by 2004 for manufacturing average)
- Pre-WTO Chinese tariffs varied widely (0% to 60%+)
- Industries already below the WTO ceiling had nothing to cut → realized change ≈ 0
- Industries far above the ceiling had to cut a lot → realized change proportional to (initial tariff − ceiling)

So `Tariff_{i,2001}` is a **nearly mechanical predictor of the realized cut** for all industries with tariff > ceiling. It functions as an intent-to-treat dose.

### What CBS would call this

In Callaway-Goodman-Bacon-Sant'Anna language, `D_i = Tariff_{i,2001}` is the continuous treatment dose. Industries with `D_i ≤ 0.10` (roughly the WTO ceiling) are approximately untreated — dose ≈ 0 in effect because the mandated cut was ~0. Industries with `D_i > 0.10` are in the "treated" region, with dose proportional to how far above the ceiling they started.

The teaching code (`examples/China-WTO/wto_example.R`) makes this CBS mapping explicit by defining `dose = pmax(tariff_2001 − 0.1, 0)`. This gives:
- `dose = 0` for industries with pre-WTO tariff ≤ 10%, creating the CBS-style untreated group
- `dose = tariff_2001 − 0.1` for industries with pre-WTO tariff > 10%, the mechanical cut size

Under this re-parameterization, the CBS `ATT(d|d)` formula from Theorem 3.1 says:
```
ATT(d|d) = E[ΔY | D = d] − E[ΔY | D = 0]
```
which in Lu & Yu's context is: "the effect of being in a dose-d industry is the mean change in log(Theil) for that industry minus the mean change for untreated (≤10%-tariff) industries." This is literally what the teaching code computes before fitting linear, spline, and bins functional forms to `ΔY − count_trend_estimate` on dose.

**Lu & Yu's main spec is this exercise under the restriction that `ATT(d|d) = β · d`** — i.e., the linear parametric case of CBS Section 4.1.

### Dose limitations (what CBS would critique)

- The paper uses `Tariff_{i,2001}` directly, not `max(Tariff_{i,2001} − 0.1, 0)`, so the "untreated" group in the main regression is not sharply defined. In effect the regression includes low-tariff industries in the pre-minus-post contrast, and the coefficient is interpreted as if a change from 0 tariff to any positive tariff produces a linearly-proportional effect. A CBS reader would want to see the Theorem 3.4 decomposition of this β^{twfe} — it likely has the "negative weights" problem from CBS §3.3, where below-mean-dose industries are effectively acting as the comparison group.
- The linear-in-dose assumption is strong. CBS §4.1 (eqs. 4.1–4.5) advocates for parametric (quadratic), spline-based, or saturated-binned estimators to let the `ATT(d|d)` curve bend. The teaching code fits all three and lets the reader eyeball the discrepancy.
- The linearity restriction means the paper doesn't distinguish `ATT(d|d)` (under plain parallel trends) from `ATT(d)` (under strong parallel trends). The coefficient `β` is agnostic about which causal target it's aiming at.

## ★ Outcome: detailed anatomy ★

### From firm markups to industry-year dispersion

The construction is a three-step pipeline:

1. **Estimate firm markups `μ_{fit}`** using De Loecker & Warzynski (2012):
   ```
   μ_{fit} = θ^m_{fit} · (α^m_{fit})^{-1}
   ```
   where `θ^m_{fit}` is the firm's output elasticity of intermediate materials (recovered from an estimated translog production function) and `α^m_{fit} = (p^m · M) / (P · Q)` is its materials cost share in revenue. Logic: under perfect competition price = marginal cost, so the elasticity equals the revenue share. Any wedge between them is the markup. Holds under Cournot, Bertrand, monopolistic competition.

2. **Production function:** quantity-based translog with firm-specific input-price control function (Q–TL–IP):
   ```
   q_{fit} = β_l l + β_k k + β_m m + β_ll l² + β_kk k² + β_mm m²
           + β_lk lk + β_km km + β_lm lm + β_lkm lkm
           + B(p, ms, e; β̃) + ω_{fit} + ε_{fit}
   ```
   Estimated separately for each 2-digit industry via Ackerberg-Caves-Frazer (2006) two-step GMM with Olley-Pakes proxy-for-productivity (materials demand function) and De Loecker et al. (2014) input-price control function (`B(·)` as a function of output price, market share, exporter status, interacted with deflated inputs). Output in physical quantity (from the merged product-ASIF sample) to avoid omitted price bias. Single-product producers used to estimate elasticities, then applied to multi-product firms assuming same technology.

3. **Aggregate to industry-year Theil dispersion:**
   ```
   Theil_{it} = (1/n_{it}) · Σ_{f=1}^{n_{it}} (μ_{fit}/μ̄_{it}) · log(μ_{fit}/μ̄_{it})
   ```
   where `μ̄_{it}` is the mean markup in industry i at year t and `n_{it}` is the firm count. **Drop top 2.5% and bottom 2.5% of firm markups within each (industry, year) cell** before computing dispersion to protect against outliers.

   `y_{it} = log(Theil_{it})` is the dependent variable in the main regression.

### Alternative dispersion measures (robustness)

All four measures enter as logs. Results in Table 3 Cols 1–3:
- **Gini index:** `β = −0.145***`
- **Coefficient of variation (CV = sd / mean):** `β = −0.164***`
- **Relative mean deviation (RMD = mean absolute deviation / mean):** `β = −0.152***`
- **Mean log deviation (MLD, in Appendix Table 1):** similar

Theil is the main measure because it's decomposable and statistically testable (Cowell 1995). All four deliver the same sign and significance, so the result is not an artifact of the Theil choice.

### Outcome decomposition (Table 5 mechanism analysis)

Markup = price / marginal cost. A drop in markup dispersion can come from price compression, cost compression, or both. Lu & Yu run three separate dispersion regressions:

| Outcome | `β̂` | SE |
|---|---|---|
| Theil(markup) | −0.293*** | 0.104 |
| Theil(TFP) | **−1.206***** | 0.435 |
| Theil(price) | −0.949** | 0.398 |
| Theil(marginal cost) | −0.961** | 0.405 |

**Productivity dispersion falls by the most** — implying marginal cost variation is a big part of the story. And price dispersion and marginal cost dispersion both fall by roughly a factor of 3x the markup dispersion effect. Both channels operate. The price and marginal cost regressions are restricted to the single-product firm subsample (818 obs over 147 industries), which is where the paper can cleanly separate price from the markup ratio.

### Distributional shape (Table 4 quantile regressions)

Instead of using a dispersion statistic, run the main DiD with the markup at specific quantiles (p5, p25, p50, p75, p95) as the outcome:

| Quantile | `β̂` | SE | Significant? |
|---|---|---|---|
| p5 | **+0.046*** | 0.025 | Yes at 10% |
| p25 | +0.029 | 0.020 | No |
| p50 | +0.014 | 0.022 | No |
| p75 | −0.006 | 0.023 | No |
| p95 | −0.028 | 0.048 | No |
| Mean | +0.013 | 0.020 | No |

**The "flattening" story is really a "low-end lift" story.** Only the p5 coefficient is significant, and it's positive. The mean is insignificant. Higher quantiles have point estimates that decline monotonically (from +0.029 at p25 to −0.028 at p95) but none individually statistically distinguishable from zero. Honest summary: the paper shows strong evidence that markup dispersion compresses; the within-distribution story is more tentative and mostly driven by the p5 rising.

## Main findings (tabulated)

| Finding | Coefficient | SE | Source |
|---|---|---|---|
| Main effect: log(Theil) on `Tariff × Post` | **−0.322*** | 0.104 | Table 1 Col 1 |
| With time-varying controls | **−0.307*** | 0.103 | Table 1 Col 2 |
| With Gentzkow pretrend interactions | **−0.313*** | 0.101 | Table 1 Col 3 |
| Anticipation placebo (`Tariff × 1{t=2001}`) | 0.075 (insig) | 0.128 | Table 2 Col 1 |
| Topalova pre-WTO placebo | 0.002 (insig) | 0.001 | Table 2 Col 4 |
| Processing-trader placebo | −0.076 (insig) | 0.460 | Table 2 Col 5 |
| Gini | **−0.145*** | 0.052 | Table 3 Col 1 |
| 4-digit industry | **−0.277*** | 0.080 | Table 3 Col 4 |
| Two-period collapse | **−0.313*** | 0.101 | Table 3 Col 9 |
| Imports response (Poisson PML) | **+0.021*** | 0.000 | Table 4 Col 1 |
| Log price dispersion | **−0.949** | 0.398 | Table 5 Col 3 |
| Log marginal cost dispersion | **−0.961** | 0.405 | Table 5 Col 4 |
| Entry/exit firms | **−0.335** | 0.194 | Table 6 Col 2 |
| Surviving firms | −0.157 (insig) | 0.103 | Table 6 Col 1 |
| Coastal | **−0.262*** | 0.117 | Table 6 Col 5 |
| Inland | **−0.359** | 0.124 | Table 6 Col 6 |

## Mapping Lu & Yu into CBS language

| CBS concept | Lu & Yu implementation |
|---|---|
| Unit `i` | 3-digit CIC industry |
| Two-period DiD (`t=1`, `t=2`) | 1998–2001 vs 2002–2005 (collapsed in Table 3 Col 9; panel in Table 1) |
| Dose `D_i` | `Tariff_{i,2001}` (decimal) |
| Approximate `D = 0` group | Industries with `Tariff_{i,2001}` close to or below the 10% WTO ceiling |
| `D > 0` group | Industries with `Tariff_{i,2001} > 0.1` |
| Outcome `Y_{it}` | `log(Theil(μ)_{it})` |
| Parallel trends assumption | Figure 4 + Topalova pre-period placebo (Table 2 Col 4) |
| Anticipation / SUTVA check | Anticipation placebo (Table 2 Col 1) + processing trader placebo (Table 2 Col 5) |
| `ATT(d\|d)` estimator functional form | Linear parametric (eq. 4.2 in CBS language) — not spline, not sieve |
| `ACRT(d)` interpretation | Implicit in `β` — but the paper doesn't make the distinction |
| SPT vs PT | Paper implicitly assumes SPT when treating `β · d` as a dose-response curve |

## What the paper does NOT do (through a CBS lens)

1. **No functional-form exploration.** The spec commits to linearity in dose. CBS §4.1 says this is a strong assumption — the teaching code tests it by fitting linear, B-spline, and binned versions side by side.
2. **No `ATT(d|d)` vs `ATT(d)` distinction.** The paper doesn't separate "effect on dose-d industries among themselves" from "effect on the average treated industry."
3. **No event study.** The paper shows a pre-post plot (Figure 4) but doesn't run a true CBS-style event study with `ATT^{es}_loc(e)` coefficients for each year.
4. **No decomposition of β^{twfe}.** CBS Theorem 3.4 says the linear TWFE coefficient with a continuous dose is a weighted integral of `ATT(d|d)` values with potentially negative weights. The paper doesn't check whether this is a problem — probably because in 2015 nobody was asking this question in print yet.

None of these are criticisms of the paper on its own terms. They are exactly the things the Substack post is going to foreground as "what CBS lets you check that Lu & Yu couldn't." Lu & Yu is the perfect teaching example because:
- The natural experiment is clean
- The dose variable is clean and interpretable
- The linear-in-dose spec is both simple to explain and vulnerable to CBS critique
- The teaching code already reconstructs it from scratch in R and Stata
- The data is publicly released at the industry-year level (`AEJ_ind_DID_3-digit.dta`) so a reader can replicate everything

## Mechanism summary

1. **Pre-WTO tariffs varied widely** across industries (Figure 1, Figure 2).
2. **WTO ceilings forced high-tariff industries to cut disproportionately** (Figure 2).
3. **Imports grew more in high-tariff product categories** (Table 4 Col 1: Poisson PML on HS-6 imports) → import competition channel confirmed.
4. **Within-industry firm markup dispersion declined** (Table 1) — the headline effect.
5. **Decomposition:** both productivity dispersion and price dispersion fall (Table 5). Both the cost side and the price side of the markup compress.
6. **Distributional channel:** most of the compression comes from **lifting low markups up** (Table 4 Col 2, p5 coefficient) rather than pushing high markups down.
7. **Firm-margin channel:** the effect is driven by **entry and exit**, not by adjustments among surviving firms (Table 6 Cols 1–2). Consistent with new firms replacing less-efficient incumbents.
8. **Diminishing-returns pattern:** effects are larger for SOEs and inland cities (less competitive pre-WTO) than for non-SOEs and coastal cities (already more competitive pre-WTO).

Overall: Chinese manufacturing was already reasonably competitive by 2001, and the WTO accession shock acted like a "polish" on the markup distribution — tightening it, especially at the lower end and in less competitive pre-WTO environments, primarily through entry/exit churn rather than through incumbent behavior change.

## Contribution to the literature

- **New trade theory (Arkolakis-Costinot-Rodríguez-Clare 2012):** ACR predict that under Pareto productivity, intensive and extensive margins of trade cancel → no markup dispersion response. Lu & Yu find they don't cancel in the data → Pareto assumption is wrong, or ACR's assumptions don't hold for China.
- **Misallocation literature (Hsieh-Klenow 2009):** provides a concrete mechanism by which trade can reduce misallocation — markup compression — separate from the productive efficiency channel.
- **China-specific trade work:** contrasts with Brandt et al. (2012), who found positive effects of tariff cuts on *mean* markups using revenue-Cobb-Douglas production functions. Lu & Yu find the mean is insignificant but the dispersion falls, using quantity-translog production functions with input-price correction. The production function estimation choice matters (footnote 16).
- **Pro-competitive effects of trade:** joins a literature (de Blas-Russ 2012, Holmes-Hsu-Lee 2014, Edmond-Midrigan-Xu 2014) arguing that allocative efficiency gains from trade are real and separately measurable from productive efficiency gains, but does not calculate welfare itself — that requires a structural model, left to Hsu-Lu-Wu 2014.

## Summary one-liner for the Substack

**Lu & Yu (2015) is a linear-in-dose TWFE DiD applied to China's WTO tariff shock at the 3-digit industry level. The dose is the pre-WTO (2001) tariff, the outcome is log(Theil) of within-industry firm markup dispersion, and the punchline is that industries that got bigger mandated tariff cuts saw ~5% larger declines in markup dispersion — a pure allocative efficiency gain that appears on top of the productivity gains Brandt et al. documented using the same data. It's the perfect CBS teaching case because (a) the dose is mechanically clean, (b) the pre-WTO period has observable parallel trends, (c) the linear specification is simple to explain, and (d) it maps directly onto CBS's §4.1 linear `ATT(d|d)` estimator, which CBS then teaches you how to generalize to splines, bins, and the CCK sieve.**