Within-candidate FE on 568 self-sponsored candidate-poll rows gives β = +7.75 pp (p<0.001) — large, robust to spec, robust to renormalization choice. The opponent-sponsored coefficient is -1.93 (p=0.030), so the bias is sender-specific not a generic house effect.
Results
Table: Headline within-candidate FE (Spec 2, all-Brazil, n = 30,555 candidate-poll rows)
| Coefficient | β | SE | p |
|---|---|---|---|
sponsored_by (self-sponsored) |
+7.75 | 1.34 | <0.001 |
opponent_sponsored |
−1.93 | 0.89 | 0.030 |
(from build/table/regressions.csv)
Table: Spec ladder (same sample)
| Spec | β (sponsored_by) | SE | p |
|---|---|---|---|
| naive (no FE) | +7.57 | 0.88 | <0.001 |
| Spec 1 (candidate + pollster FE) | +7.60 | 1.34 | <0.001 |
| Spec 2 (+ methodology controls) | +7.75 | 1.34 | <0.001 |
| Spec 2 WLS (weighted by sample size) | +7.83 | 1.37 | <0.001 |
(from build/table/regressions.csv)
Symmetry test (Spec 2): β_self − β_opp ≈ +9.7 pp — sender-specific bias, not generic pollster house effect.
Sample composition:
- 568 self-sponsored candidate-poll rows (across 793 polls, after the Route A/B/C/D sponsor→candidate join)
- 1,048 opponent-sponsored
- 8,431 unique candidates / 2,942 unique races
- 448 unique pollsters
Interpretation
- Headline: β ≈ +7–8 pp is the project's headline empirical finding. Polls commissioned by a Brazilian mayoral candidate overstate that candidate's eventual vote share by ~7 percentage points on average, relative to polls of the same candidate paid for by others.
- Selection: The naive → FE transition adds almost nothing (+7.57 → +7.60), meaning candidates who self-sponsor are not strongly selected on standing within the all-Brazil sample (unlike the SP-only prototype, where the FE moved β from +2.17 to +7.64 because SP's small sample exaggerated the selection problem).
- Sign-flip: The opponent-sponsored coefficient (β_opp = −1.93) is significant and runs the opposite way, mechanically supporting the sender-specific reading: opponent-sponsored polls understate a given candidate by about 2 pp.
Confidence rationale (green). Three independent specs converge on β ≈ +7–8 pp, the all-Brazil sample is large (n = 568 self-sponsored rows across 8,431 candidates and 448 pollsters), FE does not move the point estimate, and the opponent-sponsored sign-flip rules out a generic house effect. Residual uncertainty is about how the bias is produced (Channel A declared-methodology vs Channel B residual/fabrication), not whether it exists — and that is a follow-up question, not a threat to the headline.
Follow-ups
- The timing-controlled robustness specs (3a, 3b, 3c) are documented separately under AN-002.
- The Channel A vs Channel B decomposition (does the +7 pp come from declared methodology choices or from residual / fabrication?) is still gated on the
poll_methodologyLLM extractor — seepipelines/politica/docs/todo.md. - Headline magnitude is stable; main residual uncertainty is how rather than whether.