Decisions

2026-06-02 — Promote from idea to project

Decision: Move research/ideas/poll-sponsor-bias/ to projects/poll-sponsor-bias/ with the canonical project structure.

Reason: The design's empirical core is settled: three independent specs (within-candidate FE, race × week FE, descriptive within-candidate jump) converge on β ≈ +6 to +7 pp; the pre-poll trajectory placebo (n=132, t = 5.21) decisively rules out the "candidate commissions when leading" alternative. The next 2–3 months of work — Channel A vs B decomposition, theory framing, write-up — benefits from a project structure (paper/, decisions.md, formal todo).

Alternatives considered:

Implications: First-todo of the project is the poll_methodology LLM extractor (queued in pipelines/politica/docs/todo.md) — Channel A vs B is the project's main unfinished empirical lever.


2026-06-02 — Outcome variable: poll percent renormalized within scenario

Decision: Compute error = poll_percent_normalized - 100 * final_share where poll_percent_normalized = 100 * percent / sum(percent over non-aggregate candidates in scenario), and final_share = candidate_votes / sum(candidate_votes within muni).

Reason: Final-share denominator is valid candidate votes (excludes Branco/Nulo). Poll percent's natural denominator includes "Don't know / Branco / Nulo" — which would mechanically bias error negative if not renormalized. Renormalization brings both to a common denominator.

Alternatives considered:

Implications: Error scale interpretation: error = +7 pp means the poll overstates the candidate's valid-vote share by 7 percentage points. Matches the headline finding's magnitude across all specs.


2026-06-02 — Clean comparator: media + pollster-self sponsored only

Decision: For the timing-controlled specs (3a/3b/3c), restrict the regression sample to candidate-poll rows where the poll is either (a) sponsored by this candidate (treatment) or (b) sponsored exclusively by independent media or pollster-self (control). Drop opponent-sponsored rows and rows in mixed/other-firm polls.

Reason: The "candidate commissions when leading" concern can only be addressed by comparing against polls that are themselves independent of the candidate's signal. Mixed-sponsor polls and polls sponsored by other-candidate committees are biased in unknown directions and would contaminate the comparator.

Alternatives considered:

Implications: The clean-comparator spec drops 9,102 rows (opponent-sponsored or mixed/other), reducing the sample from 30,555 to 21,453 but identifying β on the cleaner comparison. Headline β estimates are stable (~+6 to +8) across the choice.


2026-06-02 — Routes A+B+C+D for sponsor → candidate

Decision: Match sponsor CPF / CNPJ to candidate via four routes:

Reason: The committee-CNPJ name parse (Route B) catches the bulk of obvious cases. Routes C+D extend coverage to party-directorate sponsorship, which is meaningful for the 1:1 electoral-law constraint (each party fields at most one mayoral candidate per muni).

Alternatives considered:

Implications: 568 candidate-poll rows with sponsored_by=1 across 793 polls (vs the SP-only 22 with just A+B). The all-Brazil sample is large enough to identify the clean-comparator + race × week FE spec on 60 cells / 409 rows.


2026-06-02 — match_score ≥ 2 cutoff for the regression sample

Decision: Restrict the regression sample to candidate-poll rows with match_score ≥ 2 (multi-token or stronger match between poll candidate name and the TSE registry).

Reason: Score-1 matches are single-token (e.g., "Luiz" matching "Luiz Roberto" by first name only), which mixes legitimate matches with false positives. Score-2 (multi-token) and above are reliable. The nome_urna patch (score 4) gives most matches; the score-3 (substring) and score-2 (multi-token) tails are also clean.

Alternatives considered:

Implications: Drops 1,419 candidate-poll rows in the SP-style filter (most are mis-matches that would have introduced noise).