title: Heterogeneity analysis (first pass) status: first-pass (2026-06-02)

Heterogeneity analysis — first pass

Battery of heterogeneity regressions on the all-Brazil sample, mapping each cut back to a specific theory in docs/theory.md and docs/thinking.md.

All specs run with candidate FE absorbed via linearmodels.PanelOLS, pollster (institute) FE as a second absorbing dimension where the sample permits, structured methodology controls (log sample size, days-to-election, days²). Cluster-robust SEs at the race (muni) level. Baseline = full sample, β_self = +8.00 pp (SE 1.33, p<10⁻⁹) — slightly above the headline +7.6 because the analysis table now carries more rows (~31k vs 30.5k) after the heterogeneity joins.

Outputs:

1. Final-rank position (coordination vs bandwagon)

β by candidate's final-rank position in their race (all munis):

Rank β SE p n_self
1 +7.55 1.73 <0.001
2 +9.30 1.59 <0.001
3 +7.76 4.50 0.084
4 +8.87 4.87 0.069
5+ +3.36 3.89 0.387

β peaks at rank 2 across all munis. Sponsors of clearly-hopeless (rank 5+) candidates do not slant — consistent with the "no return to slant from a candidate no one believes is viable" prediction.

2. The discriminating test — final-rank × runoff-eligibility

The cleanest test of coordination vs bandwagon (from docs/theory.md). Coordination predicts peak β at rank 2 in small munis (plurality, M+1=2) and rank 3 in runoff-eligible munis (M+1=3). Bandwagon predicts peak at rank 2 regardless of muni size.

Small munis (< 200k registered voters):

Rank β SE p
1 +7.67 2.27 <0.001
2 +10.14 2.05 <0.001
3 +6.10 6.22 0.327
4 +6.73 6.27 0.283

Runoff munis (≥ 200k registered voters):

Rank β SE p
1 +7.33 2.94 0.013
2 +7.41 3.34 0.027
3 +9.43 3.74 0.012
4 +6.34 3.09 0.040

This is the headline qualitative result: the position of peak β shifts from rank 2 in small munis to rank 3 in runoff-eligible munis, exactly as the coordination theory predicts. Bandwagon would predict rank 2 in both. The shift is in the direction Cox's M+1 rule says it should be — though SEs are wide in the runoff sample (n_self=83 vs 558 in small munis) so the rank-2-vs-rank-3 gap there is not statistically distinguishable.

Coordination dominates bandwagon as a voter-side mechanism.

3. First-time candidate (quality cue)

Slice β SE p
Non-first-time (baseline) +7.08 1.55 <0.001
Interaction (× first-time) +1.77 2.48 0.474
Implied β | first_time=1 +8.85
Implied β | first_time=0 +7.08

The quality-cue prediction: β should be larger for lesser-known (first-time) candidates because voters have weaker priors to anchor. Direction is correct (+1.77 toward first-time) but the interaction is not statistically significant (p=0.47). Weak support; likely needs sharper experience measures (e.g., prior elected office, years-in-office) to test cleanly.

4. Verifiable disclosure (days-to-election interaction)

The verifiable-disclosure prediction (docs/thinking.md): slant has a future cost because the election eventually verifies the poll. β should shrink as the election approaches.

Days from election Implied β
7 +7.00
30 +7.85
90 +10.07
180 +13.41

The slope coefficient β × days = +0.037 pp per day (SE 0.019, p=0.054). Marginally significant. Direction matches the prediction: polls 6 months out are slanted by ~13 pp, polls in the final week by ~7 pp. The shrinkage is half the magnitude of the headline β — a substantial fraction of total bias goes away as election day approaches.

Verifiable-disclosure theory: passes, marginally significant.

5. Sponsor route (who's slanting harder?)

β by which route maps the sponsor to the candidate. The interesting finding: candidates who pay with their own CPF show a dramatically larger slant than those paying via committee or party.

Route β SE p n_self
CPF (candidate's own CPF) +19.12 6.89 0.006 18
Committee (CNPJ) +8.73 1.56 <0.001 429
Party (CNPJ) +7.70 2.35 0.001 42
Party-name parse +6.33 2.59 0.015 152

The CPF cell is small (n_self=18), so the +19 estimate has wide intervals. But it's statistically distinguishable from the others. Possible reading: candidates with the strongest personal stake (paying out-of-pocket rather than via a committee/party) push the hardest on slant. Alternative reading: the CPF cases are systematically different (e.g., smaller-budget candidates with tighter incentives, or candidates with weaker formal-committee structure).

6. Race competitiveness

β by tertile of race margin (top1 − top2 final share):

Margin tertile β SE p n_self
Tight (smallest margin) +10.07 1.79 <0.001 292
Mid +8.19 2.46 <0.001 170
Wide (largest margin) +4.39 2.63 0.094 179

Strong monotonic gradient — slant is 2.3× larger in tight races than in landslides. Consistent with both coordination (the M+1 cutoff matters more in tight races) and bandwagon (more room to flip apparent leader identity). Not discriminating between the two, but reassures that the design captures the right object.

7. Per-pollster β

build/table/per_pollster_beta.csv — 33 institutes with ≥5 self-sponsored polls.

Most-slanted institutes (top 5 by β):

Institute β SE n_self
INTENCAO INSTITUTO DE PESQUISA LTDA +30.25 13.25 5
OPINAR PESQUISAS LTDA. +8.92 1.33 9
I. M. MENDONCA +4.73 0.00 5
INSTITUTO GERAIS LTDA +3.01 0.03 9
PROMIDIA PESQUISA DE OPINIAO PUBLICA E MARKETING LTDA +2.09 0.38 10

Negatively-loaded institutes (β < 0):

Institute β SE n_self
EVA FRANCIELI DE SOUZA PEREIRA −9.43 2.33 20
AR7 PESQUISAS INTELIGENTES LTDA −3.99 1.10 9
INSTITUTO PARANA DE PESQUISAS −10.50 9.77 34

Notable institutes near zero:

So a clear story emerges: pollster heterogeneity is substantial. The headline +8 pp average disguises a wide distribution — a handful of small firms show very large effects, while several mid-volume firms (Verita, IIP, Census) are essentially unbiased on this metric. Important for both interpretation and any potential publication discussion (it's not "Brazilian pollsters" as a class).

Caveats:

Summary of heterogeneity findings

Test Direction predicted Direction observed Significant?
Rank-position × runoff-eligibility Peak shifts rank 2 → rank 3 at 200k cutoff (coordination) Yes, peak at rank 2 (small) → rank 3 (runoff) Suggestive (runoff cell has wide SEs)
First-time candidate β larger (quality cue) β larger by +1.77 No (p=0.47)
Days-to-election β decreases as election approaches (verifiable disclosure) +0.037 pp/day decrease Marginal (p=0.054)
Race competitiveness β larger in tight races +5.7 pp tight vs wide Yes (both p<.001)

Most decisive finding for theory discrimination: the rank × runoff-eligibility heterogeneity comes out the way coordination predicts and bandwagon does not. Combined with the +6.7 pp within-candidate-trajectory placebo (docs/briefs/all_brazil_analysis.md), the project has two independent pieces of evidence pointing toward the coordination mechanism.

Most striking standalone finding: per-pollster β ranges from +30 (Intencao) to −10 (Parana Pesquisas), with several mid-volume firms near zero. Pollster identity matters — the headline average is not the firm-level story.

Open follow-ups