title: Working hypothesis status: living document last_updated: 2026-06-02
Working hypothesis
Synthesis of where we currently stand. The atomic claims live in
findings/, the formal predictions in
hypotheses/, the frameworks in theory.md,
the regressions in analyses/. This document knits them
into the causal account we currently believe; it should be edited
whenever a new analysis materially changes the story.
TL;DR
The +7–8 pp poll sponsor bias is a real, identified, within-candidate
within-race effect (AN-001,
AN-003). Crude
Channel B (per-row fabrication) is ruled out
(AN-013). What remains in the
candidate set is Channel A (Bayesian persuasion through
disclosure-compliant methodology choices) alongside sophisticated
Channel B variants the digit forensics don't reach and design
dimensions registration doesn't surface — but documented Channel
A levers quantitatively underexplain the +7 pp: the residual
decomposition in thinking.md puts the generous
upper bound on attributed levers at 1–5 pp out of 7, leaving 2–6
pp unattributed. The single concretely quantified lever
(population frame, +4.6 pp point estimate) has SE 3.7 / p = 0.21
on the LLM subset; wide CIs keep open scenarios where Channel A
explains most, but in point-estimate terms Channel A is unlikely
to be the main explanation. Documented concrete-design differences
(population frame, coverage class, census-setor cluster usage) are
directionally right but individually too small to add up to +7 pp;
the loudest within-pair contrast is sponsored-vs-independent
opacity of methodology documentation (lower audit %, less
complete description, more coverage deferral), which is not itself
a mechanism. The size mismatch and the open probe agenda live in
source-of-bias.md. Partisan bairro/setor
selection is empirically absent and runs the opposite direction
(AN-032, preliminary).
Coverage deferral specifically is now ruled out in both directions:
sponsors do not disproportionately defer (AN-024)
and deferral does not amplify sponsor bias within candidate
(AN-033), tightening
the residual to population frame, coverage class, and census-setor
usage.
The slant is sharply concentrated in small, low-volume pollster
firms; the large-volume Brazilian polling industry shows β ≈ 0
within candidate (AN-016,
AN-018). Reputation-by-
volume explains the firm-level discipline gradient
(theory.md § Pollster reputation),
and media filtering gives it its bite — big-firm polls reach
voters, so their reputation cost from a visible miss is highest
(AN-025). The bias
survives the disclosure regime because voters cannot in practice
parse the 1,600-char sampling plan or run the regression that would
back out the firm-conditional discount
(theory.md § Why bias survives voter discounting).
The chain of evidence
1. The headline effect is real
| Spec | β | Source |
|---|---|---|
| Spec 1 (candidate FE only) | +7.60 | AN-001 |
| Spec 2 (candidate + pollster FE + methodology controls) | +7.98 | AN-001 |
| Spec 3c (race × week FE, strict clean comparator) | +7.94 | AN-002 |
| Pre-poll trajectory placebo (within-candidate +6.7 pp jump, t = 5.2) | — | AN-003 |
| Raw poll-percent (no renormalization), conservative bound | +5.10 | AN-010 K2, AN-014 |
The opponent-sponsored mirror runs in the predicted direction (β_opp ≈ −1.9 pp; findings/opponent-sponsored-mirror.md). The headline is a sender-specific effect (findings/sender-specific.md), not a generic pollster house effect.
2. The effect survives every robustness check we ran
| Critique | Closed by |
|---|---|
| Comparator contamination, route false positives, FE selection | AN-010 K1–K5 |
| FE-structure permutation chance, single-firm or -state dominance | AN-011 |
| Thin-cluster CRVE under-coverage, sample-size weighting, week-window brittleness | AN-012 |
| Channel B per-row fabrication (digit forensics) | AN-013 |
| K2 "sponsors list fewer candidates" hypothesis | AN-014 |
| Differential LLM extraction quality (script proxies + within-firm) | AN-015, AN-016 |
| Party-level heterogeneity (joint Wald test) | AN-009 |
The robustness brief at briefs/robustness.md
collects all eleven sections in one place.
3. Crude Channel B is ruled out; documented Channel A levers underexplain the headline
Crude per-candidate post-fielding tampering is falsified by
within-sponsored-poll digit symmetry (AN-013);
sophisticated Channel B variants remain open. The Channel A evidence
splits into (i) concrete design-choice differences (population
frame, coverage class, census-setor cluster usage — directionally
right but small in measured magnitude), and (ii) opacity
differences (audit %, methodology completeness — statistically loud
but not a mechanism by themselves). The documented design levers do
not yet add up quantitatively to +7-8 pp; the opacity signal is the
loudest within-pair contrast but cannot be treated as a mechanism on
its own. Partisan bairro/stronghold selection is empirically absent
and runs the opposite direction
(AN-032). Coverage
deferral specifically is ruled out as a Channel-A lever —
sponsors do not disproportionately defer
(AN-024) and
deferral does not amplify sponsor bias within-candidate
(AN-033); mode of
data collection is also ruled out — sponsored polls never use
phone mode (0/244) and over-use the gold-standard in-person mode (95 %
vs 89 %) in the curated-pairs sample, the opposite of the
cheap-mode-slant prior
(AN-041). Interviewer-side
opacity is also ruled out: sponsored polls describe interviewer
training more (84.4 % vs 72.5 %, McNemar p=0.002) and supervisor
role more (92.6 % vs 87.3 %, p=0.08) — and bias does not co-vary with
which side describes (AN-042).
This refines the opacity reading from "blanket" to selective
disclosure — sponsored polls under-document sample-shape dimensions
(coverage, audit) but over-document visible-rigor dimensions
(interviewers, supervisors). Nonresponse handling cannot be tested
at this layer — registration PDFs uniformly omit the field
(AN-043: 100 %
not_specified, vocabulary virtually absent in free-text grep), and
testing this lever requires a separate LLM extraction over the
post-fielding relatório PDFs. The candidate-name rotation contrast surfaced in
AN-051 (sp
5.4 % vs ind 26.1 %, McNemar p ≈ 4 × 10⁻⁸) was initially read as a
sharp within-pair Channel-A signal, but the two natural follow-ups
walked it back: AN-052
shows the contrast vanishes within firm (firm-FE LPM coef +0.025 pp,
p = 0.37 — 19 of 20 both-side firms have identical rotation rates),
and AN-053 shows
sponsored polls actually list the sponsor's own candidate later
on the sponsored side (McNemar on first-position rate p = 0.001 in
the WRONG direction). The rotation finding is a firm-tier composition
signal — sponsors choose low-discipline firms that don't document
rotation as a firm-level practice — and folds into the existing
AN-018 firm-size
discipline gradient. The within-poll cherry-picking hypothesis is
refuted (sp fewer scenarios than ind, Wilcoxon p = 0.002). The
remaining concrete within-firm Channel-A levers are population
frame, coverage class, and census-setor usage. The
canonical lever inventory, the size-mismatch problem, and the open
agenda for closing it live in source-of-bias.md;
this section deliberately stays brief so the synthesis doesn't drift
from the canonical doc.
4. The slant is concentrated in low-volume firms
Within-firm β refit on 31 firms with ≥ 5 self-sponsored polls (AN-016):
| Tertile | n firms | median total polls | mean β | n significant (p < 0.05) |
|---|---|---|---|---|
| Small | 12 | 13 | +11.98 | 9 / 12 |
| Medium | 10 | 41 | +8.64 | 8 / 10 |
| Large | 9 | 118 | −0.93 | 2 / 9 |
The within-firm β range is [−10.9, +35.2] across the 31 firms — a 46-pp spread that cannot be a data-quality artifact (PDF style and LLM extraction template are constant within each firm by construction). The dispersion is real cross-firm sponsor-behavior heterogeneity.
Reputation by volume dominates customer-mix sorting (AN-017, AN-018). Joint regression of per-firm β on log(volume) and candidate-share gives δ_log_n_total = −7.09 (p = 0.0002, R² = 0.40 under WLS); the candidate-share coefficient becomes statistically insignificant.
theory.md § "Pollster reputation: volume vs customer mix"
formalizes the mechanism: a firm with more aggregate output under
its name has more reputation at stake from a single visible miss, so
self-discipline is the equilibrium for big firms.
5. Media filtering gives the reputation gradient its bite
If voters and donors see polls only through media coverage, and media preferentially covers polls from trusted (high-volume) firms, then big firms face sharper reputation stakes precisely because their polls reach the consumer. AN-025 tests this within race on the runoff-eligible muni panel (aptos ≥ 200k, 172 munis, 889 race × firm cells):
| Spec | δ (log-volume coef) | p |
|---|---|---|
| log(1 + Google News hits) | +0.053 | 0.092 |
| Any-hit LPM (≥1 hit) | +0.019 | 0.045 |
Doubling firm volume → +1.3 pp higher probability of any media coverage of the firm's polls in that race, with race FE absorbing every race-level confound. Descriptive tier contrast: large-volume firms have any-hit coverage on 80 % of cells versus 60 % for small / medium. The structural feedback loop closes:
pollster reputation (volume) → media filter → voter information set → demand for credible pollsters → reputation discipline
6. Why voters don't discount it away
theory.md § "Why bias survives voter discounting"
lays out the five sub-mechanisms that survive scrutiny once the
obvious resolutions (design opacity, rank-not-level, short-horizon
sponsors) are eliminated:
- Rational inattention — acquiring a precise estimate of sponsor-conditional bias is costly; per-voter benefit is tiny.
- Anecdotal updating is biased — small samples, salience-weighted, unlikely to match the academic regression estimate.
- Channel-A heterogeneity — per-poll bias depends on the specific design choices, so a constant discount factor over-corrects honest sponsored polls and under-corrects heavily-designed ones.
- Marginal pivotal voter is less informed than the average voter; sponsors care about the voter who tips the race.
- Pollster long-run reputation is multi-dimensional — firms can maintain final-week-headline-poll accuracy while slanting on commissioned polls.
7. Demand-side selection: who pays for slant, when?
A parallel line of work documents the selection into self-sponsorship
(AN-026,
AN-027,
AN-028,
AN-029,
AN-030). The signal
running through these analyses: candidates near the viability
threshold are most likely to commission self-sponsored polls
(findings/coordination-shift.md,
findings/tight-race-amplification.md).
This is consistent with the voter-side coordination/bandwagon
mechanisms in theory.md: a +7 pp inflation moves a
candidate from "competing" to "viable," which is exactly where the
strategic-voting cost of being seen as non-viable is highest.
The bias-amplification side splits differently than the selection
side. AN-045 decomposes the headline by final_rank × race_margin
on the AN-001 panel: the largest tight-race sponsor-bias amplification
sits at rank 1, not rank 2 (sponsor effect: rank-1 +4.80pp
non-tight → +12.20pp tight, vs rank-2 +9.54 → +6.81). Rank-2
over-commissioning (AN-027) and rank-1 over-statement (AN-045)
are two different cells of the rank × margin manifold — selection
and bias amplification mechanisms operate on different sides of the
demand-side story. Bandwagon-style "manufacture inevitability"
incentives on rank-1-in-tight fit the AN-045 pattern; coordination-
demand at rank-2-in-tight fits the AN-027 pattern. The two are
mechanically consistent (both predict +7-12pp deviations in close
races) but live in non-overlapping cells.
Sharpening with rank-at-commission (AN-050) cleans up the AN-040 /
AN-045 ex-post-rank artifact and surfaces a new viability-grab
pattern. Replacing final_rank with the candidate's rank in the
most recent prior neutral poll
(AN-050):
(i) the bandwagon-leader signal at rac1-tight survives but at half
the magnitude (+6.59pp vs +12.20pp); (ii) the AN-040 rank-2
over-statement vanishes under ex-ante rank (rac2 sponsor effect:
−0.46 non-tight, +4.20 tight) — it was Simpson's-style aggregation
across heterogeneous within-candidate trajectories; (iii) a thin-cell
surprise: rac3+ candidates in tight races show sponsor effect
+18.64pp (n=14 sponsored), suggesting a viability-grab mechanism
where trailing candidates in close races commission polls to push
themselves into the top-2 coordination zone. Yellow confidence
descriptive; thin-cell triage and date-anchor robustness queued.
What this implies for policy
paper.tex § Policy implications develops the disclosure-based response: sponsor-aware consumer-side disclosure (media republication must surface sponsor identity with the same prominence as the headline number) and per-firm scorecards (TSE publishes each firm's within-candidate β with 95 % CI). The findings rule out punitive remedies — license revocation, fines on biased firms, criminalizing methodology choices — because:
- The slant is sender-specific (the same firm is unbiased for media clients and slanted for candidate clients), so punishing the firm misallocates blame from the campaign that demanded the slant.
- Crude per-candidate tampering is ruled out
(AN-013), so for the share
of the bias that flows through declared methodology there is no
crude fraudulent act to penalize. But the unattributed 2–6 pp
residual (per
thinking.md§ "Residual decomposition") keeps sophisticated Channel B variants and unobservable design dimensions in the candidate set, so the enforcement angle is not foreclosed for that share. - Reputation-by-volume already disciplines the big-name industry at the firm-aggregate level without regulatory force (AN-018); in the low-volume specialist segment where the slant concentrates (AN-016: within-firm β range −10.9 to +35.2, small-tertile mean +11.98), and for the share of the bias that flows through unobservable channels, the marginal value of regulation is both information and enforcement in proportion to how much of the +7 pp ends up attributed to disclosure-compliant Channel A vs the residual.
What's still uncertain (open questions)
| Question | Status |
|---|---|
| Channel-A regression decomposition at full magnitude | Blocked on the LLM methodology extractor's full-universe (~14k poll) batch run. The paired-pair design (paper § Extension in development) is the preliminary read; full-universe regression is in production. |
| Sophisticated Channel-B (preserves digit distributions, proportional rescaling, pre-publication quota reweighting) | Not ruled out by AN-013. Will be the residual after Channel-A controls land. |
| Partisan bairro/setor at seção resolution | AN-032 preliminary; the cleanest test requires IBGE setor shapefiles + a spatial join, not in the local pipeline. |
| Cross-cycle replication (2020 mayoral, 2022 federal) | Not yet run; would tighten the firm-volume slope and the customer-mix sorting result. |
| Hand-validation of LLM extraction quality | Queued in todo.md § Data-quality validation. AN-013/AN-015/AN-016 are script-based proxies; the hand-validation is the gold-standard direct test. |
| Whether the media filter (AN-025) reflects firm name distinguishability (query noise) or real reputation | Re-scrape at higher cap + distinctive-names robustness parked as the AN-025 follow-ups. |
Document hygiene
This document is living. Edit it whenever an AN result materially changes the story or a new line of evidence enters the chain. Each section should hyperlink to its supporting analyses; if a section ages and no longer reflects current beliefs, mark the section "(legacy)" rather than deleting — the version history of the document is its own record of how the story converged.