title: Source of the bias — what mechanism carries the +7 pp gap? status: living document last_updated: 2026-06-02

Source of the bias

Synthesis of the supply-side mechanism evidence: what about poll production carries the +7–8 pp sponsor effect? This doc is the canonical home for the lever inventory and the running stocktake of the channel-decomposition question. The broader story (voter side, selection into sponsorship, policy implications) lives in working-hypothesis.md; the theoretical framework lives in theory.md § Bayesian persuasion.

TL;DR — honest framing

We have ruled out crude per-row fabrication (AN-013), partisan bairro/stronghold selection (AN-032, preliminary), and the obvious data-quality alternatives (AN-015, AN-016). What remains in the candidate set is Channel A (design-driven Bayesian persuasion through disclosure-compliant methodology choices) alongside sophisticated Channel B variants the digit forensics don't reach and design dimensions registration doesn't surface — but the documented Channel A levers quantitatively underexplain the +7 pp headline.

The residual decomposition in thinking.md § "Residual decomposition of the +7 pp" puts the generous upper bound on documented Channel A levers at 1–5 pp out of the +7 pp — scenario rotation 1–2 pp (and walked back to firm-tier composition by AN-052), population-reference mismatch 0–1 pp, ponderação 0–2 pp. 2–6 pp of the headline is unattributed. Wide CIs on the single quantified lever (population frame point +4.6 pp, SE 3.7, p = 0.21, n = 354) leave open scenarios where Channel A still explains most of the +7 pp, but the point-estimate reading is that Channel A is unlikely to be the main explanation.

The within-pair evidence splits into two qualitatively different findings:

Concrete design-choice differences (real methodology levers that mechanically tilt the sample): population reference frame, coverage class, census-setor cluster usage. Directionally right but individually small and not adding up to +7 pp at point estimates.
Opacity differences (sponsored polls are less documented, not differently designed): audit %, methodology completeness, coverage deferral. The statistically loudest within-pair contrasts — but opacity is the absence of documentation, not a mechanism by itself.

The honest read: documented Channel A levers are directionally consistent but quantitatively underexplain the +7 pp; the unattributed residual could live in Channel A's wide upper bounds, in sophisticated Channel B variants, or in design dimensions the registration regime doesn't surface. Until a specific design lever is shown to carry the slant quantitatively, or the queued probes below close the gap, the substantive answer is provisional and Channel A is one candidate, not the leading one.

The two-channel framing

The headline β = +7–8 pp can in principle decompose into two mechanisms (per theory.md § Bayesian persuasion):

Channel A — Bayesian persuasion through declared design. The sponsor chooses a pollster who chooses a disclosure-compliant methodology whose realized demographics mechanically favor the candidate. The slant is in the methodology, fully filed at registration.
Channel B — residual / fabrication. The realized numbers deviate from what the disclosed methodology should produce. The slant cannot be traced through any registered field.

The policy interpretations differ: Channel A → constrain methodology choices; Channel B → fraud / enforcement regime. Below, the evidence rules out crude Channel B (digit forensics) but does not adjudicate decisively between Channel A and sophisticated/unobservable variants — documented Channel A levers are directionally right but quantitatively small, sophisticated Channel B variants remain untested, and the unattributed 2–6 pp residual leaves the enforcement-regime interpretation in the candidate set at non-trivial weight. Caveats explicit at the lever level below.

What we have ruled out

Mechanism	Status	Evidence
Crude post-fielding tampering of sponsor's row	Ruled out	AN-013: within-sponsored-poll digit symmetry between sponsor's row and other-candidate rows; A vs B z-test p > 0.6 on every digit metric
Sponsors list fewer candidates → renormalization inflates share	Ruled out	AN-014: 98.6 % of the K2 raw-vs-renormalized gap is mechanical multiplicative scaling, not a denominator design choice
Differential LLM extraction quality	Ruled out (3 fronts)	AN-015 (proxies), AN-016 (within-firm 46 pp β spread under fixed PDF style)
Partisan bairro/setor selection (cherry-pick sponsor's strongholds)	Ruled out (preliminary)	AN-032: within-pair contrast = −0.0029 (t = −5.3) on the 22-pair muni-rich subset, direction opposite the prediction
Sophisticated Channel B (preserves digit distributions, proportional rescaling, pre-publication quota reweighting)	NOT ruled out	AN-013's digit forensics catch crude tampering only; the structured Channel A controls (once the full-universe extractor runs) will quantify the residual
Mode substitution (sponsored swap in-person for phone/online to reach a cheaper, less-representative sample)	Ruled out (preliminary)	AN-041: 0/244 sponsored polls use phone mode vs 10 % independent — sponsored side over-represents in-person. χ² p = 0.0003 in the opposite direction of the cheap-mode-slant prior.

What we have as evidence for Channel A

Concrete design-choice differences (real, but small)

These are sponsored-vs-independent contrasts on specific methodology levers the pollster declares at registration. They correspond to mechanically tilting the realized sample.

Lever	Direction	Magnitude	Source
Population reference frame (mixed vs canonical TSE-eligible)	sp > ind on `mixed`	9 of 20 differing pairs sponsored-side `mixed`; 0 reverse — perfectly one-sided asymmetry	findings-paired
Coverage class (urban-only / specific-neighborhoods)	sp > ind on slant-permissive coverage	+2 pp (12 % sp vs 10 % ind on n=200 methodology subset); 10 + 5 = 15 of 49 differing-coverage biased pairs in the right direction	AN-019, findings-paired
Census-setor as cluster frame (used / not)	sp > ind	39 % of biased-pair pairs disagree on this vs 26 % well-behaved baseline (+13 pp gap)	findings-paired
Quota variable mix (which variables declared)	mixed	not decisive on the n=200 subset	AN-022
~~Candidate-name rotation in the questionnaire~~ — demoted to firm-tier signal	sp < ind marginally, null within firm	AN-051 marginal 5.4 % vs 26.1 % (McNemar p ≈ 4 × 10⁻⁸) is a between-firm composition effect (AN-052 firm-FE LPM coef +0.025 pp, p = 0.37; 19 of 20 both-side firms have identical rotation rates). AN-053's direct candidate-position test refutes the priming-exploitation reading (sp lists candidate later; McNemar p = 0.001 in WRONG direction).	AN-051, AN-052, AN-053

These are real findings. They are in the predicted direction. The combined regression preview (paper.tex § Extension in development) gives sponsored × mixed_population interaction $= +4.6$ pp (SE 3.7, p = 0.21) — directionally consistent, statistically underpowered at the current $n=354$ extracted protocols. The size-quantification of "how much of +7 pp does each lever carry" is blocked on the full-universe LLM methodology extraction (queued in todo.md § Mechanism decomposition).

Opacity differences (loud, selective — not a mechanism by themselves)

These are measurement-quality signals about how documented the methodology is. They do not by themselves describe a tilt. Importantly, the opacity gap is not blanket — it is field-specific ("selective disclosure"). Sponsored polls under-document the dimensions that determine who is in the realized sample, but over-document the dimensions that signal craftwork.

Signal	Direction (sp vs ind)	Magnitude	Source
Audit %	sp less documented	KS-test rejection	AN-021
Methodology completeness index	sp less documented	t-test rejection on n=200	AN-022
Coverage deferred at registration	sp less documented (more deferred)	Universe-scale χ² with sharp odds ratio	AN-024
Interviewer training described	sp more documented (+11.9 pp)	McNemar p=0.002 on 244 pairs	AN-042
Supervisor role described	sp more documented (+5.3 pp)	McNemar p=0.08 on 244 pairs	AN-042

These tell us that sponsored polls are systematically less-documented on the sample-shape dimensions and more documented on the visible-rigor dimensions — a strategic-disclosure pattern, not a blanket opacity. The bias contrast in AN-042 does not co-vary with which side describes interviewer training (MW p = 0.58), so interviewer-side documentation is not a slant carrier. The selective-disclosure pattern is consistent with two hypotheses:

Mechanism is hidden inside opacity — the lack of documentation IS where the slant happens. Sponsored polls don't describe their methodology because describing it would surface choices that wouldn't pass scrutiny, OR because the methodology is ad-hoc / non-standard in ways the boilerplate templates can't capture.
Opacity is a quality proxy, mechanism is elsewhere — sponsored polls being less-documented may be a cost-saving or discipline pattern that correlates with bias without carrying it; the real mechanism is in a lever we haven't extracted yet.

Without distinguishing (1) from (2), reporting opacity as a "Channel A finding" would overclaim. The honest read leaves the question open.

The size-mismatch problem

Note (2026-06-02). The size-mismatch problem and the ruled-out levers below are now documented in paper/paper.tex §5 "Extension in development" in the two paragraphs "Size mismatch and ruled-out levers" and "Selective disclosure". This section is the research-doc canonical version; the paper writeup is the external-audience version of the same content.

If the population-frame lever moves β by +4.6 pp on the LLM-extracted subset (the only currently-quantified concrete design lever), and coverage-class shifts are +2 pp on the subset, the documented design levers collectively account for less than half of the +7-8 pp pooled β. The remainder lives in either:

The same levers measured at the full-universe scale (where power is sharper)
Other design levers we haven't yet contrasted at quantitative scale
Sophisticated Channel B variants below the digit-forensics resolution
Genuinely the opacity gap: sponsored polls are slanted because the documentation regime doesn't compel them to describe the slanting steps

Until that gap is closed, the working answer is "documented Channel A levers are directionally consistent but quantitatively underexplain the +7 pp; the unattributed 2–6 pp residual could live in Channel A's wide upper bounds, in sophisticated Channel B variants the digit forensics don't reach, or in design dimensions the registration regime doesn't surface." In point-estimate terms, Channel A is unlikely to be the main explanation; in upper-bound terms it remains in the candidate set.

Candidate mechanisms left to probe

In rough order of (a) testability with what we have and (b) plausible mechanistic bite. Each of these would help close the size-mismatch problem if positive evidence shows up; if all return null, the "opacity is genuinely the answer" reading becomes well-earned.

~~Within-poll scenario selection~~ — refuted by AN-051 on the 241-pair sample. Sponsored polls file fewer vote- intention scenarios on average (3.73 vs 4.49, Wilcoxon p = 0.002), not more — the opposite direction of the cherry- pick prior. Only 6 pairs have sponsored ≥ 8 scenarios and their mean bias is below the overall mean.
Question / name-order priming — ABSORBED into the firm-tier discipline gradient. AN-051 landed a sharp marginal contrast (sp 5.4 % vs ind 26.1 % rotation documented; McNemar p ≈ 4 × 10⁻⁸), but three follow-ups jointly walked it back: AN-052 — contrast vanishes within firm (firm-FE LPM p = 0.37); AN-053 — sponsored polls list the sponsor's candidate later, not earlier (McNemar p = 0.001 in the wrong direction); and AN-054 — firm-level rotation rate is 3.7 % / 15.2 % / 19.4 % across small / medium / large firm-volume tertiles on the broad 124-firm panel, a monotonic 5× gradient matching AN-018's discipline curve. The lever is sponsor choice of low-volume, low-discipline firms (AN-018 mechanism), not within-firm sponsor-side manipulation of questionnaire design.
~~Non-response / undecided handling~~ — null-by-data-design on the methodology extraction (AN-043). The structured nonresponse_handling field is 100 % not_specified on all 488 pair-sides, and the diagnostic free-text grep confirms the vocabulary is virtually absent (5 + 2 hits out of 488 on the undecided/refusal vocabulary family). The reason is structural: TSE registration PDFs are pre-fielding planning documents, while nonresponse handling is a post-fielding analytical choice. To actually test this lever requires extending the LLM pipeline to ingest the post-fielding relatório PDFs (a separate document class). Escalated to "needs relatório extraction pipeline".
Weighting / post-stratification — extract the actual weight structure from DS_PLANO_AMOSTRAL. Sponsored polls might apply less aggressive demographic weighting (so non-representative samples don't get corrected back to population norms).
~~Mode / data-collection device~~ — ruled out by AN-041 on the 244-pair curated sample. Sponsored polls over-use in-person mode (95 % vs 89 %) and never use phone (0 vs 24); the χ² rejects mode independence at p = 0.0003 in the opposite direction of the cheap-mode-slant prior. The mode-substitution lever does not carry sponsor bias. (Mode contributes to opacity via a 5 % sponsored-side "not_specified" rate vs 1 % independent — but that is the opacity gap, not a design substitution.)
~~Interviewer training / supervision detail~~ — ruled out by AN-042 on the 244-pair sample. Sponsored polls describe interviewer training more (84.4 % vs 72.5 %, McNemar p=0.002) and supervisor role slightly more (92.6 % vs 87.3 %, p=0.08). Bias contrast does not co-vary with which side describes (MW p ≈ 0.58 both fields), so interviewer-side documentation is not a slant carrier. This refutes blanket opacity and forces the selective-disclosure reading folded into the opacity-differences table above.

Each of (1)–(5) is testable with existing data + an additional LLM extraction pass. (6) requires either an interview-protocol experiment we cannot run or a structural argument that the not-documented part of poll production is where the bias necessarily lives.

Open questions (the agenda)

Question	Status
Channel A regression decomposition at full magnitude	Blocked on full-universe LLM methodology extraction (~14k mayoral polls; Batch API path implemented, queued). Will give magnitudes per lever and the residual to Channel B.
Question-order / name-order priming (extract from questionario_pesquisa PDFs)	Pilot landed AND walked back (AN-051 → AN-052 → AN-053, 2026-06-02). Marginal contrast sharp (sp 5.4 % vs ind 26.1 % rotation documented; McNemar p ≈ 4 × 10⁻⁸); within-firm FE attenuates to +0.025 pp (p = 0.37); direct candidate-position test refutes priming (sp lists candidate LATER, McNemar p = 0.001). Rotation is a firm-tier signal that maps onto the AN-018 discipline gradient, not a within-firm Channel-A lever. Universe-scale extraction would still be informative but the headline framing has shifted.
Non-response handling × sponsor	Null-by-data-design on registration PDFs (AN-043) — both sides 100 % `not_specified`, vocabulary virtually absent in free-text grep (5+2 / 488). To test, need a separate LLM extraction pipeline over post-fielding relatório PDFs.
Weighting / post-stratification structured extraction	Not yet specified as an LLM task; needs schema design.
Mode × sponsor quantitative contrast	Done — AN-041, mode-substitution refuted (0 % phone on sponsored vs 10 % on independent; χ² p=0.0003 in the opposite direction).
Sophisticated Channel B (preserves digits)	Not testable with the existing forensics; would need an external comparator (e.g., a forensic re-poll on a sample).

Most of these are within reach with the LLM extraction pipeline already in production. The decisive Channel-A magnitude-decomposition regression is queued behind the full-scale extractor run.

Document hygiene

This is a living synthesis. When a new mechanism analysis lands, revise the relevant table here, then update working-hypothesis.md §3 (which should remain a short summary pointing back to this doc). If a candidate mechanism in the probe list above returns a sharp positive signal, move it into the "concrete design-choice differences" table and note the magnitude. If all probes return null and the size-mismatch problem persists, the "opacity is genuinely the answer" reading moves from open to settled — and that itself becomes a substantive finding worth a separate writeup.

Source of the Bias