title: Working hypothesis status: living document last_updated: 2026-06-02

Working hypothesis

Synthesis of where we currently stand. The atomic claims live in findings/, the formal predictions in hypotheses/, the frameworks in theory.md, the regressions in analyses/. This document knits them into the causal account we currently believe; it should be edited whenever a new analysis materially changes the story.

TL;DR

The +7–8 pp poll sponsor bias is a real, identified, within-candidate within-race effect (AN-001, AN-003). Crude Channel B (per-row fabrication) is ruled out (AN-013). What remains in the candidate set is Channel A (Bayesian persuasion through disclosure-compliant methodology choices) alongside sophisticated Channel B variants the digit forensics don't reach and design dimensions registration doesn't surface — but documented Channel A levers quantitatively underexplain the +7 pp: the residual decomposition in thinking.md puts the generous upper bound on attributed levers at 1–5 pp out of 7, leaving 2–6 pp unattributed. The single concretely quantified lever (population frame, +4.6 pp point estimate) has SE 3.7 / p = 0.21 on the LLM subset; wide CIs keep open scenarios where Channel A explains most, but in point-estimate terms Channel A is unlikely to be the main explanation. Documented concrete-design differences (population frame, coverage class, census-setor cluster usage) are directionally right but individually too small to add up to +7 pp; the loudest within-pair contrast is sponsored-vs-independent opacity of methodology documentation (lower audit %, less complete description, more coverage deferral), which is not itself a mechanism. The size mismatch and the open probe agenda live in source-of-bias.md. Partisan bairro/setor selection is empirically absent and runs the opposite direction (AN-032, preliminary). Coverage deferral specifically is now ruled out in both directions: sponsors do not disproportionately defer (AN-024) and deferral does not amplify sponsor bias within candidate (AN-033), tightening the residual to population frame, coverage class, and census-setor usage. The slant is sharply concentrated in small, low-volume pollster firms; the large-volume Brazilian polling industry shows β ≈ 0 within candidate (AN-016, AN-018). Reputation-by- volume explains the firm-level discipline gradient (theory.md § Pollster reputation), and media filtering gives it its bite — big-firm polls reach voters, so their reputation cost from a visible miss is highest (AN-025). The bias survives the disclosure regime because voters cannot in practice parse the 1,600-char sampling plan or run the regression that would back out the firm-conditional discount (theory.md § Why bias survives voter discounting).

The chain of evidence

1. The headline effect is real

Spec β Source
Spec 1 (candidate FE only) +7.60 AN-001
Spec 2 (candidate + pollster FE + methodology controls) +7.98 AN-001
Spec 3c (race × week FE, strict clean comparator) +7.94 AN-002
Pre-poll trajectory placebo (within-candidate +6.7 pp jump, t = 5.2) AN-003
Raw poll-percent (no renormalization), conservative bound +5.10 AN-010 K2, AN-014

The opponent-sponsored mirror runs in the predicted direction (β_opp ≈ −1.9 pp; findings/opponent-sponsored-mirror.md). The headline is a sender-specific effect (findings/sender-specific.md), not a generic pollster house effect.

2. The effect survives every robustness check we ran

Critique Closed by
Comparator contamination, route false positives, FE selection AN-010 K1–K5
FE-structure permutation chance, single-firm or -state dominance AN-011
Thin-cluster CRVE under-coverage, sample-size weighting, week-window brittleness AN-012
Channel B per-row fabrication (digit forensics) AN-013
K2 "sponsors list fewer candidates" hypothesis AN-014
Differential LLM extraction quality (script proxies + within-firm) AN-015, AN-016
Party-level heterogeneity (joint Wald test) AN-009

The robustness brief at briefs/robustness.md collects all eleven sections in one place.

3. Crude Channel B is ruled out; documented Channel A levers underexplain the headline

Crude per-candidate post-fielding tampering is falsified by within-sponsored-poll digit symmetry (AN-013); sophisticated Channel B variants remain open. The Channel A evidence splits into (i) concrete design-choice differences (population frame, coverage class, census-setor cluster usage — directionally right but small in measured magnitude), and (ii) opacity differences (audit %, methodology completeness — statistically loud but not a mechanism by themselves). The documented design levers do not yet add up quantitatively to +7-8 pp; the opacity signal is the loudest within-pair contrast but cannot be treated as a mechanism on its own. Partisan bairro/stronghold selection is empirically absent and runs the opposite direction (AN-032). Coverage deferral specifically is ruled out as a Channel-A lever — sponsors do not disproportionately defer (AN-024) and deferral does not amplify sponsor bias within-candidate (AN-033); mode of data collection is also ruled out — sponsored polls never use phone mode (0/244) and over-use the gold-standard in-person mode (95 % vs 89 %) in the curated-pairs sample, the opposite of the cheap-mode-slant prior (AN-041). Interviewer-side opacity is also ruled out: sponsored polls describe interviewer training more (84.4 % vs 72.5 %, McNemar p=0.002) and supervisor role more (92.6 % vs 87.3 %, p=0.08) — and bias does not co-vary with which side describes (AN-042). This refines the opacity reading from "blanket" to selective disclosure — sponsored polls under-document sample-shape dimensions (coverage, audit) but over-document visible-rigor dimensions (interviewers, supervisors). Nonresponse handling cannot be tested at this layer — registration PDFs uniformly omit the field (AN-043: 100 % not_specified, vocabulary virtually absent in free-text grep), and testing this lever requires a separate LLM extraction over the post-fielding relatório PDFs. The candidate-name rotation contrast surfaced in AN-051 (sp 5.4 % vs ind 26.1 %, McNemar p ≈ 4 × 10⁻⁸) was initially read as a sharp within-pair Channel-A signal, but the two natural follow-ups walked it back: AN-052 shows the contrast vanishes within firm (firm-FE LPM coef +0.025 pp, p = 0.37 — 19 of 20 both-side firms have identical rotation rates), and AN-053 shows sponsored polls actually list the sponsor's own candidate later on the sponsored side (McNemar on first-position rate p = 0.001 in the WRONG direction). The rotation finding is a firm-tier composition signal — sponsors choose low-discipline firms that don't document rotation as a firm-level practice — and folds into the existing AN-018 firm-size discipline gradient. The within-poll cherry-picking hypothesis is refuted (sp fewer scenarios than ind, Wilcoxon p = 0.002). The remaining concrete within-firm Channel-A levers are population frame, coverage class, and census-setor usage. The canonical lever inventory, the size-mismatch problem, and the open agenda for closing it live in source-of-bias.md; this section deliberately stays brief so the synthesis doesn't drift from the canonical doc.

4. The slant is concentrated in low-volume firms

Within-firm β refit on 31 firms with ≥ 5 self-sponsored polls (AN-016):

Tertile n firms median total polls mean β n significant (p < 0.05)
Small 12 13 +11.98 9 / 12
Medium 10 41 +8.64 8 / 10
Large 9 118 −0.93 2 / 9

The within-firm β range is [−10.9, +35.2] across the 31 firms — a 46-pp spread that cannot be a data-quality artifact (PDF style and LLM extraction template are constant within each firm by construction). The dispersion is real cross-firm sponsor-behavior heterogeneity.

Reputation by volume dominates customer-mix sorting (AN-017, AN-018). Joint regression of per-firm β on log(volume) and candidate-share gives δ_log_n_total = −7.09 (p = 0.0002, R² = 0.40 under WLS); the candidate-share coefficient becomes statistically insignificant.

theory.md § "Pollster reputation: volume vs customer mix" formalizes the mechanism: a firm with more aggregate output under its name has more reputation at stake from a single visible miss, so self-discipline is the equilibrium for big firms.

5. Media filtering gives the reputation gradient its bite

If voters and donors see polls only through media coverage, and media preferentially covers polls from trusted (high-volume) firms, then big firms face sharper reputation stakes precisely because their polls reach the consumer. AN-025 tests this within race on the runoff-eligible muni panel (aptos ≥ 200k, 172 munis, 889 race × firm cells):

Spec δ (log-volume coef) p
log(1 + Google News hits) +0.053 0.092
Any-hit LPM (≥1 hit) +0.019 0.045

Doubling firm volume → +1.3 pp higher probability of any media coverage of the firm's polls in that race, with race FE absorbing every race-level confound. Descriptive tier contrast: large-volume firms have any-hit coverage on 80 % of cells versus 60 % for small / medium. The structural feedback loop closes:

pollster reputation (volume) → media filter → voter information set → demand for credible pollsters → reputation discipline

6. Why voters don't discount it away

theory.md § "Why bias survives voter discounting" lays out the five sub-mechanisms that survive scrutiny once the obvious resolutions (design opacity, rank-not-level, short-horizon sponsors) are eliminated:

  1. Rational inattention — acquiring a precise estimate of sponsor-conditional bias is costly; per-voter benefit is tiny.
  2. Anecdotal updating is biased — small samples, salience-weighted, unlikely to match the academic regression estimate.
  3. Channel-A heterogeneity — per-poll bias depends on the specific design choices, so a constant discount factor over-corrects honest sponsored polls and under-corrects heavily-designed ones.
  4. Marginal pivotal voter is less informed than the average voter; sponsors care about the voter who tips the race.
  5. Pollster long-run reputation is multi-dimensional — firms can maintain final-week-headline-poll accuracy while slanting on commissioned polls.

7. Demand-side selection: who pays for slant, when?

A parallel line of work documents the selection into self-sponsorship (AN-026, AN-027, AN-028, AN-029, AN-030). The signal running through these analyses: candidates near the viability threshold are most likely to commission self-sponsored polls (findings/coordination-shift.md, findings/tight-race-amplification.md). This is consistent with the voter-side coordination/bandwagon mechanisms in theory.md: a +7 pp inflation moves a candidate from "competing" to "viable," which is exactly where the strategic-voting cost of being seen as non-viable is highest.

The bias-amplification side splits differently than the selection side. AN-045 decomposes the headline by final_rank × race_margin on the AN-001 panel: the largest tight-race sponsor-bias amplification sits at rank 1, not rank 2 (sponsor effect: rank-1 +4.80pp non-tight → +12.20pp tight, vs rank-2 +9.54 → +6.81). Rank-2 over-commissioning (AN-027) and rank-1 over-statement (AN-045) are two different cells of the rank × margin manifold — selection and bias amplification mechanisms operate on different sides of the demand-side story. Bandwagon-style "manufacture inevitability" incentives on rank-1-in-tight fit the AN-045 pattern; coordination- demand at rank-2-in-tight fits the AN-027 pattern. The two are mechanically consistent (both predict +7-12pp deviations in close races) but live in non-overlapping cells.

Sharpening with rank-at-commission (AN-050) cleans up the AN-040 / AN-045 ex-post-rank artifact and surfaces a new viability-grab pattern. Replacing final_rank with the candidate's rank in the most recent prior neutral poll (AN-050): (i) the bandwagon-leader signal at rac1-tight survives but at half the magnitude (+6.59pp vs +12.20pp); (ii) the AN-040 rank-2 over-statement vanishes under ex-ante rank (rac2 sponsor effect: −0.46 non-tight, +4.20 tight) — it was Simpson's-style aggregation across heterogeneous within-candidate trajectories; (iii) a thin-cell surprise: rac3+ candidates in tight races show sponsor effect +18.64pp (n=14 sponsored), suggesting a viability-grab mechanism where trailing candidates in close races commission polls to push themselves into the top-2 coordination zone. Yellow confidence descriptive; thin-cell triage and date-anchor robustness queued.

What this implies for policy

paper.tex § Policy implications develops the disclosure-based response: sponsor-aware consumer-side disclosure (media republication must surface sponsor identity with the same prominence as the headline number) and per-firm scorecards (TSE publishes each firm's within-candidate β with 95 % CI). The findings rule out punitive remedies — license revocation, fines on biased firms, criminalizing methodology choices — because:

What's still uncertain (open questions)

Question Status
Channel-A regression decomposition at full magnitude Blocked on the LLM methodology extractor's full-universe (~14k poll) batch run. The paired-pair design (paper § Extension in development) is the preliminary read; full-universe regression is in production.
Sophisticated Channel-B (preserves digit distributions, proportional rescaling, pre-publication quota reweighting) Not ruled out by AN-013. Will be the residual after Channel-A controls land.
Partisan bairro/setor at seção resolution AN-032 preliminary; the cleanest test requires IBGE setor shapefiles + a spatial join, not in the local pipeline.
Cross-cycle replication (2020 mayoral, 2022 federal) Not yet run; would tighten the firm-volume slope and the customer-mix sorting result.
Hand-validation of LLM extraction quality Queued in todo.md § Data-quality validation. AN-013/AN-015/AN-016 are script-based proxies; the hand-validation is the gold-standard direct test.
Whether the media filter (AN-025) reflects firm name distinguishability (query noise) or real reputation Re-scrape at higher cap + distinctive-names robustness parked as the AN-025 follow-ups.

Document hygiene

This document is living. Edit it whenever an AN result materially changes the story or a new line of evidence enters the chain. Each section should hyperlink to its supporting analyses; if a section ages and no longer reflects current beliefs, mark the section "(legacy)" rather than deleting — the version history of the document is its own record of how the story converged.