Methods
Within-candidate FE for sponsor bias
Identification strategy
Unit of observation: candidate-poll (one row per (politico_id, protocol) pair in estimulado scenarios). Estimating equation:
error_{c,p} = β · SponsoredBy_{c,p}
+ γ · OpponentSponsored_{c,p}
+ λ_pollster + μ_(c × race) + f(days_to_election) + ε
where:
error_{c,p} = poll_percent_{c,p} - 100 * final_share_c. Poll percent is renormalized within (protocol × scenario_label) over the non-aggregate candidate set, so the denominator matchesfinal_share = candidate_votes / sum(candidate_votes within muni).SponsoredBy_{c,p} = 1iff candidate c is linked to a sponsor of poll p via Routes A+B+C+D (CPF, committee CNPJ, party CNPJ, or party name).OpponentSponsored_{c,p} = 1iff some other candidate in the same race is linked to a sponsor of poll p.
The within-candidate FE (μ_(c × race)) strips each candidate's
average level — β identifies off the same candidate observed in polls
sponsored by them vs polls sponsored by others. The pollster FE
(λ_pollster) separates a generically-rosy firm from a firm rosy
specifically for its client.
Estimand
- parameter:
β— within-candidate, average error gap between self-sponsored and non-self-sponsored polls - interpretation: percentage-point overstatement of the sponsoring candidate's vote share in polls they commissioned, relative to polls of the same candidate commissioned by others (or by independent media)
Key assumptions
- Within-candidate FE removes the candidate's average standing → β unconfounded by selection into who sponsors at all.
- Pollster FE removes the firm's average house effect → β identifies the client-specific component, not generic firm slant.
- Timing controls (
days_to_election, race × month / race × week FE in specs 3b/3c) absorb time-varying race-level shocks. Without them, candidates who commission polls when privately believing they're leading could generate a spurious β > 0; with them, β is identified from polls fielded in the same race within the same time window.
Specification ladder
- Spec 1: pollster + candidate FE only
- Spec 2: + structured methodology controls (
log(sample_size),days_to_election,days_to_election²) - Spec 3a: clean comparator (drop opponent-sponsored rows; keep treatment + media/pollster-self sponsored only) + candidate FE
- Spec 3b: clean comparator + candidate FE + race × month FE
- Spec 3c: clean comparator + candidate FE + race × week FE (the strict spec: identifies off polls in the same race within the same week)
Symmetric test
- Sign-test:
β_self - β_opp. If sponsorship operates on the sender's own candidate (rather than as a generic pollster house effect), β_self > 0 and β_opp < 0. Empirical: β_self ≈ +7.7, β_opp ≈ -1.9.
Pre-poll trajectory placebo
- For each self-sponsored poll, look at the most recent INDEPENDENT
poll fielded before it in the same race. Compute the
within-candidate jump
(error_self - error_indep_pre). - If "self-sponsor when leading" is the explanation, the preceding independent poll should already show the candidate high — both measure the same private peak — and the jump is ~0.
- Observed: mean jump +6.7 pp (t = 5.2, n=132, median time gap 10 days). Time gap too short for genuine momentum to plausibly explain that magnitude.
Robustness
- Match-score sensitivity: relaxing
match_score ≥ 2to≥ 1(single-token matches) doesn't move the headline. - Sample-size weighting (WLS): β stable.
- Scenario type: estimulado is the primary; espontaneo and votos_validos are robustness scenarios with the same design.
Open extensions (queued)
- Channel A vs B decomposition — add the LLM-extracted methodology
features (
coverage_class,is_quota_sample,population_reference, etc.) on top of the structured methodology controls. β shrinkage isolates Channel A (design-driven slant), residual is Channel B (fabrication / interviewer effect). Seedocs/todo.md§ Mechanism decomposition. - Sponsor-type LLM refinement — LLM pass on the 16% "other/unknown" contratante pool to lift the treatment count.
Estimation
Cluster-robust SEs at race (muni) level. Candidate FE absorbed via
linearmodels.PanelOLS within-demeaning (8,431 entities — Patsy
C(entity) materializes a dense dummy matrix that OOMs).
drop_absorbed=True handles candidates with no within-variation.