AN-059: Variance decomposition of +7 pp into within-firm and between-firm

**Variance decomposition of the headline +7 pp: 100 % within-firm, ~0 % between-firm sponsor selection.** Spec A (cand FE only, headline replicate): β = +7.85 pp (SE 1.24, p ≈ 2.6 × 10⁻¹⁰). Spec B (cand FE + firm FE, the within-firm sponsor effect averaged across all 426 firms in the analysis sample): β = +7.98 pp (SE 1.25, p ≈ 1.9 × 10⁻¹⁰). The between-firm component (A − B) is **−0.13 pp**. The +7 pp is not 'sponsors hire firms that systematically over-state' — it is 'firms slant when paid by candidates, regardless of which firm'. Same firm, same pollster style, different customer → +8 pp. The firm-level slant-for-hire selection hypothesis (2-4 pp prior in docs/thinking.md residual decomposition) is structurally ruled out at headline scale. AN-016's within-firm β dispersion (sd 10.3) is still real; AN-018's size-discipline (small firms slant more among the 31 firms ≥ 5 sponsored) still holds; AN-059's decomposition says sponsors do not preferentially load on the high-slant firms in the universe at large.

Hypothesis: H1: Self-sponsored polls overstate the sponsoring candidate
Confidence: green
Type: robustness

Design

Sample: 27,919 candidate-poll rows from build/assemble/cand_poll.parquet after dropping rows missing error / log_sample_size / days_to_election / pollster_cnpj / muni_id and restricting to candidates appearing in ≥ 2 polls. 5,164 candidates, 426 firms (pollster_cnpj), 490 sponsored_by==1.
Specification: Spec A: error ~ sponsored + opponent_sponsored + log_sample_size + days_to_election + days² | candidate FE; cluster-robust SE at muni. Spec B: same regressors, adds firm FE (pollster_cnpj) as a second absorbed effect via PanelOLS.other_effects. The within-firm sponsor coefficient (Spec B) is what remains after firm-level baselines are absorbed; the difference β_A − β_B is the between-firm (selection) contribution.
Comparator: each firm's own non-sponsored polls (firm FE constructs this implicitly)
Cluster: muni
Weights: none

Script: source/analysis/an-059-firm-fe-decomp.py
Target: build/table/an-059-firm-fe-decomp.csv
Status: interpreted · 2026-06-14
Created: 2026-06-14

Question

AN-016 / AN-017 / AN-018 established that within-firm β varies wildly across firms (sd 10.3 pp, range [−11, +35], 31 firms with ≥ 5 sponsored polls). AN-018 found that firm SIZE explains most of that cross-firm dispersion among those 31 firms.

What was still missing — and what the residual-decomposition entry added to docs/thinking.md on 2026-06-14 flagged as the load-bearing diagnostic — is the explicit decomposition of the HEADLINE +7 pp into:

within-firm sponsor effect (any given firm tilts by this much on its sponsored polls relative to its own non-sponsored polls),
between-firm sponsor selection (sponsors disproportionately hire firms whose baselines are already higher).

If between-firm is large, the mechanism story is sponsor selection of firms — methodology choices are a downstream story but the load-bearing lever is which firm gets hired. If between-firm is small, the +7 pp is genuine within-firm methodology slant — the same firm produces honest polls for media and tilted polls for sponsors.

Design

source/analysis/an-059-firm-fe-decomp.py:

Load build/assemble/cand_poll.parquet (31,186 cand-poll rows).
Apply the headline sample filter (drop NA on the core controls, restrict to candidates with ≥ 2 polls). N = 27,919.
Two nested PanelOLS specs with the same control set (sponsored, opponent_sponsored, log_sample_size, days_to_election, days²) and muni-clustered SE:
- Spec A — within-candidate FE only (the headline structure).
- Spec B — within-candidate FE + firm FE (pollster_cnpj).
Read β_sponsored from each spec. The between-firm component is β_A − β_B.
Tertile sensitivity: split firms into thirds by row count, run the headline cand-FE spec separately within each tertile, compare against AN-018.

Results

Headline decomposition

Spec	β_sponsored	SE	t	p
A: cand FE only (headline)	+7.85 pp	1.24	+6.32	2.6 × 10⁻¹⁰
B: cand FE + firm FE	+7.98 pp	1.25	+6.37	1.9 × 10⁻¹⁰
Between-firm (A − B)	−0.13 pp
Within-firm share	102 %
Between-firm share	−2 %

The headline replicate (Spec A) lands at +7.85 pp on this strict sample, consistent with the +7-8 pp range across specs in the headline analysis. Adding firm FE BARELY MOVES the coefficient — if anything, the within-firm effect is slightly larger than the headline. The sponsor effect is essentially all within-firm.

Tertile sensitivity (all 426 firms, not AN-018's 31)

firm-size tertile	n firms	n rows	β_sponsored	SE	p
small	342	8,301	+6.10	2.43	0.012
medium	58	8,487	+8.76	2.76	0.001
large	13	8,548	+2.82	2.16	0.192

Medium-tier firms drive the bulk of the slant by row count. Large firms slant less (consistent with AN-018's reputation-discipline story). The pattern is compatible with AN-018 — the earlier analysis ranked firms WITH ≥ 5 sponsored polls (31 firms) and found small slant more; AN-059's tertile here uses ALL 426 firms by total row count, so the "small" bucket includes many firms with 1-2 polls only. The two findings are decomposing differently; both hold.

Interpretation

The +7 pp is within-firm, not between-firm

The clean reading: pollsters slant when paid by candidates and don't slant (or slant less) when paid by media or by themselves. Same firm, same methodology PDF style, same back-office. Customer identity flips the output.

This rules out the firm-level slant-for-hire SELECTION hypothesis at headline scale. Sponsors don't disproportionately concentrate on firms with systematically higher baseline error rates. If they did, β_A would substantially exceed β_B; instead, β_A ≈ β_B (within 0.13 pp).

It does NOT rule out sponsor selection on which firms do candidate work. Sponsors clearly cluster on a subset of firms (490 sponsored polls across 426 firms is uneven). But the firms they pick aren't systematically "high-baseline-error" firms; they're just firms willing to take candidate contracts.

Reconciling with AN-016 and AN-018

AN-016's within-firm β dispersion (sd 10.3 pp across 31 firms) is real. AN-018's size-discipline story (small firms +12, large firms −1 within the 31) is real. These describe heterogeneity in the within-firm slant across firms, not the variance decomposition of the headline.

AN-059 averages across all 426 firms, weighted by sample size. The weighted average happens to land at +7.98 pp ≈ +7.85 pp because medium-tier firms (which carry most of the sample) slant near that average. Small firms slant more but contribute fewer polls; large firms slant less and contribute proportionally fewer than their size would suggest.

Implication for the residual decomposition

The docs/thinking.md 2026-06-14 residual decomposition (after AN-051, AN-056, AN-057) attributed:

Structural levers tested: 1-5 pp explained
Unexplained residual: 2-6 pp
Hypothesis #2 (firm-level slant-for-hire selection): 2-4 pp prior

AN-059 zeroes out hypothesis #2. The residual is now even more firmly in the within-firm-methodology zone. The five remaining untested categories — sophisticated fabrication, wave selection, sample frame contamination, interviewer scripting, strategic timing — must collectively explain a larger share than the prior allocation suggested.

The next test priority shifts accordingly:

~~Firm-level decomposition~~ — done; result is "no selection".
Wave-selection test: candidate-sponsored polls within firm × muni × month vs pollster-self polls in the same firm × muni × month.
Sophisticated fabrication forensics: AN-013 v2 with stronger tests.
Strategic timing × news events: needs event database.

Follow-ups

Wave-selection test as AN-060. For each (firm × muni × month) bucket, count polls filed by sponsorship type. Within a firm's filing calendar in a race, do candidate-sponsored waves cluster on specific weeks (e.g. post-rally), or are they distributed uniformly within the firm's filing dates? This is the natural next read on the residual.
Universe-scale weighting extraction is even less attractive now. With firm selection zeroed out, the residual 2-6 pp is structurally inaccessible to extraction-from-registration-text. Scaling poll_weighting to universe gives tighter CIs on the AN-057 (+0.04 p) signal but doesn't address the residual.
Update paper's mechanism narrative. The headline is now cleaner: "the +7 pp is within-firm methodology slant, not sponsor firm selection". This is a sharper claim than the inventory in §5 currently makes.