title: Source of the bias — what mechanism carries the +7 pp gap? status: living document last_updated: 2026-06-02

Source of the bias

Synthesis of the supply-side mechanism evidence: what about poll production carries the +7–8 pp sponsor effect? This doc is the canonical home for the lever inventory and the running stocktake of the channel-decomposition question. The broader story (voter side, selection into sponsorship, policy implications) lives in working-hypothesis.md; the theoretical framework lives in theory.md § Bayesian persuasion.

TL;DR — honest framing

We have ruled out crude per-row fabrication (AN-013), partisan bairro/stronghold selection (AN-032, preliminary), and the obvious data-quality alternatives (AN-015, AN-016). What remains in the candidate set is Channel A (design-driven Bayesian persuasion through disclosure-compliant methodology choices) alongside sophisticated Channel B variants the digit forensics don't reach and design dimensions registration doesn't surface — but the documented Channel A levers quantitatively underexplain the +7 pp headline.

The residual decomposition in thinking.md § "Residual decomposition of the +7 pp" puts the generous upper bound on documented Channel A levers at 1–5 pp out of the +7 pp — scenario rotation 1–2 pp (and walked back to firm-tier composition by AN-052), population-reference mismatch 0–1 pp, ponderação 0–2 pp. 2–6 pp of the headline is unattributed. Wide CIs on the single quantified lever (population frame point +4.6 pp, SE 3.7, p = 0.21, n = 354) leave open scenarios where Channel A still explains most of the +7 pp, but the point-estimate reading is that Channel A is unlikely to be the main explanation.

The within-pair evidence splits into two qualitatively different findings:

The honest read: documented Channel A levers are directionally consistent but quantitatively underexplain the +7 pp; the unattributed residual could live in Channel A's wide upper bounds, in sophisticated Channel B variants, or in design dimensions the registration regime doesn't surface. Until a specific design lever is shown to carry the slant quantitatively, or the queued probes below close the gap, the substantive answer is provisional and Channel A is one candidate, not the leading one.

The two-channel framing

The headline β = +7–8 pp can in principle decompose into two mechanisms (per theory.md § Bayesian persuasion):

The policy interpretations differ: Channel A → constrain methodology choices; Channel B → fraud / enforcement regime. Below, the evidence rules out crude Channel B (digit forensics) but does not adjudicate decisively between Channel A and sophisticated/unobservable variants — documented Channel A levers are directionally right but quantitatively small, sophisticated Channel B variants remain untested, and the unattributed 2–6 pp residual leaves the enforcement-regime interpretation in the candidate set at non-trivial weight. Caveats explicit at the lever level below.

What we have ruled out

Mechanism Status Evidence
Crude post-fielding tampering of sponsor's row Ruled out AN-013: within-sponsored-poll digit symmetry between sponsor's row and other-candidate rows; A vs B z-test p > 0.6 on every digit metric
Sponsors list fewer candidates → renormalization inflates share Ruled out AN-014: 98.6 % of the K2 raw-vs-renormalized gap is mechanical multiplicative scaling, not a denominator design choice
Differential LLM extraction quality Ruled out (3 fronts) AN-015 (proxies), AN-016 (within-firm 46 pp β spread under fixed PDF style)
Partisan bairro/setor selection (cherry-pick sponsor's strongholds) Ruled out (preliminary) AN-032: within-pair contrast = −0.0029 (t = −5.3) on the 22-pair muni-rich subset, direction opposite the prediction
Sophisticated Channel B (preserves digit distributions, proportional rescaling, pre-publication quota reweighting) NOT ruled out AN-013's digit forensics catch crude tampering only; the structured Channel A controls (once the full-universe extractor runs) will quantify the residual
Mode substitution (sponsored swap in-person for phone/online to reach a cheaper, less-representative sample) Ruled out (preliminary) AN-041: 0/244 sponsored polls use phone mode vs 10 % independent — sponsored side over-represents in-person. χ² p = 0.0003 in the opposite direction of the cheap-mode-slant prior.

What we have as evidence for Channel A

Concrete design-choice differences (real, but small)

These are sponsored-vs-independent contrasts on specific methodology levers the pollster declares at registration. They correspond to mechanically tilting the realized sample.

Lever Direction Magnitude Source
Population reference frame (mixed vs canonical TSE-eligible) sp > ind on mixed 9 of 20 differing pairs sponsored-side mixed; 0 reverse — perfectly one-sided asymmetry findings-paired
Coverage class (urban-only / specific-neighborhoods) sp > ind on slant-permissive coverage +2 pp (12 % sp vs 10 % ind on n=200 methodology subset); 10 + 5 = 15 of 49 differing-coverage biased pairs in the right direction AN-019, findings-paired
Census-setor as cluster frame (used / not) sp > ind 39 % of biased-pair pairs disagree on this vs 26 % well-behaved baseline (+13 pp gap) findings-paired
Quota variable mix (which variables declared) mixed not decisive on the n=200 subset AN-022
Candidate-name rotation in the questionnairedemoted to firm-tier signal sp < ind marginally, null within firm AN-051 marginal 5.4 % vs 26.1 % (McNemar p ≈ 4 × 10⁻⁸) is a between-firm composition effect (AN-052 firm-FE LPM coef +0.025 pp, p = 0.37; 19 of 20 both-side firms have identical rotation rates). AN-053's direct candidate-position test refutes the priming-exploitation reading (sp lists candidate later; McNemar p = 0.001 in WRONG direction). AN-051, AN-052, AN-053

These are real findings. They are in the predicted direction. The combined regression preview (paper.tex § Extension in development) gives sponsored × mixed_population interaction $= +4.6$ pp (SE 3.7, p = 0.21) — directionally consistent, statistically underpowered at the current $n=354$ extracted protocols. The size-quantification of "how much of +7 pp does each lever carry" is blocked on the full-universe LLM methodology extraction (queued in todo.md § Mechanism decomposition).

Opacity differences (loud, selective — not a mechanism by themselves)

These are measurement-quality signals about how documented the methodology is. They do not by themselves describe a tilt. Importantly, the opacity gap is not blanket — it is field-specific ("selective disclosure"). Sponsored polls under-document the dimensions that determine who is in the realized sample, but over-document the dimensions that signal craftwork.

Signal Direction (sp vs ind) Magnitude Source
Audit % sp less documented KS-test rejection AN-021
Methodology completeness index sp less documented t-test rejection on n=200 AN-022
Coverage deferred at registration sp less documented (more deferred) Universe-scale χ² with sharp odds ratio AN-024
Interviewer training described sp more documented (+11.9 pp) McNemar p=0.002 on 244 pairs AN-042
Supervisor role described sp more documented (+5.3 pp) McNemar p=0.08 on 244 pairs AN-042

These tell us that sponsored polls are systematically less-documented on the sample-shape dimensions and more documented on the visible-rigor dimensions — a strategic-disclosure pattern, not a blanket opacity. The bias contrast in AN-042 does not co-vary with which side describes interviewer training (MW p = 0.58), so interviewer-side documentation is not a slant carrier. The selective-disclosure pattern is consistent with two hypotheses:

  1. Mechanism is hidden inside opacity — the lack of documentation IS where the slant happens. Sponsored polls don't describe their methodology because describing it would surface choices that wouldn't pass scrutiny, OR because the methodology is ad-hoc / non-standard in ways the boilerplate templates can't capture.
  2. Opacity is a quality proxy, mechanism is elsewhere — sponsored polls being less-documented may be a cost-saving or discipline pattern that correlates with bias without carrying it; the real mechanism is in a lever we haven't extracted yet.

Without distinguishing (1) from (2), reporting opacity as a "Channel A finding" would overclaim. The honest read leaves the question open.

The size-mismatch problem

Note (2026-06-02). The size-mismatch problem and the ruled-out levers below are now documented in paper/paper.tex §5 "Extension in development" in the two paragraphs "Size mismatch and ruled-out levers" and "Selective disclosure". This section is the research-doc canonical version; the paper writeup is the external-audience version of the same content.

If the population-frame lever moves β by +4.6 pp on the LLM-extracted subset (the only currently-quantified concrete design lever), and coverage-class shifts are +2 pp on the subset, the documented design levers collectively account for less than half of the +7-8 pp pooled β. The remainder lives in either:

Until that gap is closed, the working answer is "documented Channel A levers are directionally consistent but quantitatively underexplain the +7 pp; the unattributed 2–6 pp residual could live in Channel A's wide upper bounds, in sophisticated Channel B variants the digit forensics don't reach, or in design dimensions the registration regime doesn't surface." In point-estimate terms, Channel A is unlikely to be the main explanation; in upper-bound terms it remains in the candidate set.

Candidate mechanisms left to probe

In rough order of (a) testability with what we have and (b) plausible mechanistic bite. Each of these would help close the size-mismatch problem if positive evidence shows up; if all return null, the "opacity is genuinely the answer" reading becomes well-earned.

  1. Within-poll scenario selectionrefuted by AN-051 on the 241-pair sample. Sponsored polls file fewer vote- intention scenarios on average (3.73 vs 4.49, Wilcoxon p = 0.002), not more — the opposite direction of the cherry- pick prior. Only 6 pairs have sponsored ≥ 8 scenarios and their mean bias is below the overall mean.
  2. Question / name-order priming — ABSORBED into the firm-tier discipline gradient. AN-051 landed a sharp marginal contrast (sp 5.4 % vs ind 26.1 % rotation documented; McNemar p ≈ 4 × 10⁻⁸), but three follow-ups jointly walked it back: AN-052 — contrast vanishes within firm (firm-FE LPM p = 0.37); AN-053 — sponsored polls list the sponsor's candidate later, not earlier (McNemar p = 0.001 in the wrong direction); and AN-054 — firm-level rotation rate is 3.7 % / 15.2 % / 19.4 % across small / medium / large firm-volume tertiles on the broad 124-firm panel, a monotonic 5× gradient matching AN-018's discipline curve. The lever is sponsor choice of low-volume, low-discipline firms (AN-018 mechanism), not within-firm sponsor-side manipulation of questionnaire design.
  3. Non-response / undecided handlingnull-by-data-design on the methodology extraction (AN-043). The structured nonresponse_handling field is 100 % not_specified on all 488 pair-sides, and the diagnostic free-text grep confirms the vocabulary is virtually absent (5 + 2 hits out of 488 on the undecided/refusal vocabulary family). The reason is structural: TSE registration PDFs are pre-fielding planning documents, while nonresponse handling is a post-fielding analytical choice. To actually test this lever requires extending the LLM pipeline to ingest the post-fielding relatório PDFs (a separate document class). Escalated to "needs relatório extraction pipeline".
  4. Weighting / post-stratification — extract the actual weight structure from DS_PLANO_AMOSTRAL. Sponsored polls might apply less aggressive demographic weighting (so non-representative samples don't get corrected back to population norms).
  5. Mode / data-collection deviceruled out by AN-041 on the 244-pair curated sample. Sponsored polls over-use in-person mode (95 % vs 89 %) and never use phone (0 vs 24); the χ² rejects mode independence at p = 0.0003 in the opposite direction of the cheap-mode-slant prior. The mode-substitution lever does not carry sponsor bias. (Mode contributes to opacity via a 5 % sponsored-side "not_specified" rate vs 1 % independent — but that is the opacity gap, not a design substitution.)
  6. Interviewer training / supervision detailruled out by AN-042 on the 244-pair sample. Sponsored polls describe interviewer training more (84.4 % vs 72.5 %, McNemar p=0.002) and supervisor role slightly more (92.6 % vs 87.3 %, p=0.08). Bias contrast does not co-vary with which side describes (MW p ≈ 0.58 both fields), so interviewer-side documentation is not a slant carrier. This refutes blanket opacity and forces the selective-disclosure reading folded into the opacity-differences table above.

Each of (1)–(5) is testable with existing data + an additional LLM extraction pass. (6) requires either an interview-protocol experiment we cannot run or a structural argument that the not-documented part of poll production is where the bias necessarily lives.

Open questions (the agenda)

Question Status
Channel A regression decomposition at full magnitude Blocked on full-universe LLM methodology extraction (~14k mayoral polls; Batch API path implemented, queued). Will give magnitudes per lever and the residual to Channel B.
Question-order / name-order priming (extract from questionario_pesquisa PDFs) Pilot landed AND walked back (AN-051AN-052AN-053, 2026-06-02). Marginal contrast sharp (sp 5.4 % vs ind 26.1 % rotation documented; McNemar p ≈ 4 × 10⁻⁸); within-firm FE attenuates to +0.025 pp (p = 0.37); direct candidate-position test refutes priming (sp lists candidate LATER, McNemar p = 0.001). Rotation is a firm-tier signal that maps onto the AN-018 discipline gradient, not a within-firm Channel-A lever. Universe-scale extraction would still be informative but the headline framing has shifted.
Non-response handling × sponsor Null-by-data-design on registration PDFs (AN-043) — both sides 100 % not_specified, vocabulary virtually absent in free-text grep (5+2 / 488). To test, need a separate LLM extraction pipeline over post-fielding relatório PDFs.
Weighting / post-stratification structured extraction Not yet specified as an LLM task; needs schema design.
Mode × sponsor quantitative contrast DoneAN-041, mode-substitution refuted (0 % phone on sponsored vs 10 % on independent; χ² p=0.0003 in the opposite direction).
Sophisticated Channel B (preserves digits) Not testable with the existing forensics; would need an external comparator (e.g., a forensic re-poll on a sample).

Most of these are within reach with the LLM extraction pipeline already in production. The decisive Channel-A magnitude-decomposition regression is queued behind the full-scale extractor run.

Document hygiene

This is a living synthesis. When a new mechanism analysis lands, revise the relevant table here, then update working-hypothesis.md §3 (which should remain a short summary pointing back to this doc). If a candidate mechanism in the probe list above returns a sharp positive signal, move it into the "concrete design-choice differences" table and note the magnitude. If all probes return null and the size-mismatch problem persists, the "opacity is genuinely the answer" reading moves from open to settled — and that itself becomes a substantive finding worth a separate writeup.