Methods

Identification strategy

Unit of observation: candidate-poll (one row per (politico_id, protocol) pair in estimulado scenarios). Estimating equation:

error_{c,p} = β · SponsoredBy_{c,p}
            + γ · OpponentSponsored_{c,p}
            + λ_pollster + μ_(c × race) + f(days_to_election) + ε

where:

error_{c,p} = poll_percent_{c,p} - 100 * final_share_c. Poll percent is renormalized within (protocol × scenario_label) over the non-aggregate candidate set, so the denominator matches final_share = candidate_votes / sum(candidate_votes within muni).
SponsoredBy_{c,p} = 1 iff candidate c is linked to a sponsor of poll p via Routes A+B+C+D (CPF, committee CNPJ, party CNPJ, or party name).
OpponentSponsored_{c,p} = 1 iff some other candidate in the same race is linked to a sponsor of poll p.

The within-candidate FE (μ_(c × race)) strips each candidate's average level — β identifies off the same candidate observed in polls sponsored by them vs polls sponsored by others. The pollster FE (λ_pollster) separates a generically-rosy firm from a firm rosy specifically for its client.

Estimand

parameter: β — within-candidate, average error gap between self-sponsored and non-self-sponsored polls
interpretation: percentage-point overstatement of the sponsoring candidate's vote share in polls they commissioned, relative to polls of the same candidate commissioned by others (or by independent media)

Key assumptions

Within-candidate FE removes the candidate's average standing → β unconfounded by selection into who sponsors at all.
Pollster FE removes the firm's average house effect → β identifies the client-specific component, not generic firm slant.
Timing controls (days_to_election, race × month / race × week FE in specs 3b/3c) absorb time-varying race-level shocks. Without them, candidates who commission polls when privately believing they're leading could generate a spurious β > 0; with them, β is identified from polls fielded in the same race within the same time window.

Specification ladder

Spec 1: pollster + candidate FE only
Spec 2: + structured methodology controls (log(sample_size), days_to_election, days_to_election²)
Spec 3a: clean comparator (drop opponent-sponsored rows; keep treatment + media/pollster-self sponsored only) + candidate FE
Spec 3b: clean comparator + candidate FE + race × month FE
Spec 3c: clean comparator + candidate FE + race × week FE (the strict spec: identifies off polls in the same race within the same week)

Symmetric test

Sign-test: β_self - β_opp. If sponsorship operates on the sender's own candidate (rather than as a generic pollster house effect), β_self > 0 and β_opp < 0. Empirical: β_self ≈ +7.7, β_opp ≈ -1.9.

Pre-poll trajectory placebo

For each self-sponsored poll, look at the most recent INDEPENDENT poll fielded before it in the same race. Compute the within-candidate jump (error_self - error_indep_pre).
If "self-sponsor when leading" is the explanation, the preceding independent poll should already show the candidate high — both measure the same private peak — and the jump is ~0.
Observed: mean jump +6.7 pp (t = 5.2, n=132, median time gap 10 days). Time gap too short for genuine momentum to plausibly explain that magnitude.

Robustness

Match-score sensitivity: relaxing match_score ≥ 2 to ≥ 1 (single-token matches) doesn't move the headline.
Sample-size weighting (WLS): β stable.
Scenario type: estimulado is the primary; espontaneo and votos_validos are robustness scenarios with the same design.

Open extensions (queued)

Channel A vs B decomposition — add the LLM-extracted methodology features (coverage_class, is_quota_sample, population_reference, etc.) on top of the structured methodology controls. β shrinkage isolates Channel A (design-driven slant), residual is Channel B (fabrication / interviewer effect). See docs/todo.md § Mechanism decomposition.
Sponsor-type LLM refinement — LLM pass on the 16% "other/unknown" contratante pool to lift the treatment count.

Estimation

Cluster-robust SEs at race (muni) level. Candidate FE absorbed via linearmodels.PanelOLS within-demeaning (8,431 entities — Patsy C(entity) materializes a dense dummy matrix that OOMs). drop_absorbed=True handles candidates with no within-variation.

Methods

Methods

Within-candidate FE for sponsor bias

Identification strategy

Estimand

Key assumptions

Specification ladder

Symmetric test

Pre-poll trajectory placebo

Robustness

Open extensions (queued)

Estimation