AN-001: Do 2024 mayoral polls overstate the sponsoring candidate?

Within-candidate FE on 568 self-sponsored candidate-poll rows gives β = +7.75 pp (p<0.001) — large, robust to spec, robust to renormalization choice. The opponent-sponsored coefficient is -1.93 (p=0.030), so the bias is sender-specific not a generic house effect.

Hypothesis: H1: Self-sponsored polls overstate the sponsoring candidate
Confidence: green
Type: causal

Design

Sample: estimulado-non-aggregate-match2
Specification: error ~ sponsored_by + opponent_sponsored + log(sample_size) + days_to_election + days² | candidate FE + pollster FE, cluster-robust SE at muni level
Comparator: all
Cluster: muni

Script: source/analysis/regressions.py
Target: build/table/regressions.csv
Commit: 2548d50
Status: interpreted · 2026-06-02
Created: 2026-06-02

Results

Table: Headline within-candidate FE (Spec 2, all-Brazil, n = 30,555 candidate-poll rows)

Coefficient	β	SE	p
`sponsored_by` (self-sponsored)	+7.75	1.34	<0.001
`opponent_sponsored`	−1.93	0.89	0.030

(from build/table/regressions.csv)

Table: Spec ladder (same sample)

Spec	β (sponsored_by)	SE	p
naive (no FE)	+7.57	0.88	<0.001
Spec 1 (candidate + pollster FE)	+7.60	1.34	<0.001
Spec 2 (+ methodology controls)	+7.75	1.34	<0.001
Spec 2 WLS (weighted by sample size)	+7.83	1.37	<0.001

(from build/table/regressions.csv)

Symmetry test (Spec 2): β_self − β_opp ≈ +9.7 pp — sender-specific bias, not generic pollster house effect.

Sample composition:

568 self-sponsored candidate-poll rows (across 793 polls, after the Route A/B/C/D sponsor→candidate join)
1,048 opponent-sponsored
8,431 unique candidates / 2,942 unique races
448 unique pollsters

Interpretation

Headline: β ≈ +7–8 pp is the project's headline empirical finding. Polls commissioned by a Brazilian mayoral candidate overstate that candidate's eventual vote share by ~7 percentage points on average, relative to polls of the same candidate paid for by others.
Selection: The naive → FE transition adds almost nothing (+7.57 → +7.60), meaning candidates who self-sponsor are not strongly selected on standing within the all-Brazil sample (unlike the SP-only prototype, where the FE moved β from +2.17 to +7.64 because SP's small sample exaggerated the selection problem).
Sign-flip: The opponent-sponsored coefficient (β_opp = −1.93) is significant and runs the opposite way, mechanically supporting the sender-specific reading: opponent-sponsored polls understate a given candidate by about 2 pp.

Confidence rationale (green). Three independent specs converge on β ≈ +7–8 pp, the all-Brazil sample is large (n = 568 self-sponsored rows across 8,431 candidates and 448 pollsters), FE does not move the point estimate, and the opponent-sponsored sign-flip rules out a generic house effect. Residual uncertainty is about how the bias is produced (Channel A declared-methodology vs Channel B residual/fabrication), not whether it exists — and that is a follow-up question, not a threat to the headline.

Follow-ups

The timing-controlled robustness specs (3a, 3b, 3c) are documented separately under AN-002.
The Channel A vs Channel B decomposition (does the +7 pp come from declared methodology choices or from residual / fabrication?) is still gated on the poll_methodology LLM extractor — see pipelines/politica/docs/todo.md.
Headline magnitude is stable; main residual uncertainty is how rather than whether.