title: Source of the bias — what mechanism carries the +7 pp gap? status: living document last_updated: 2026-06-02
Source of the bias
Synthesis of the supply-side mechanism evidence: what about poll
production carries the +7–8 pp sponsor effect? This doc is the
canonical home for the lever inventory and the running stocktake of
the channel-decomposition question. The broader story (voter side,
selection into sponsorship, policy implications) lives in
working-hypothesis.md; the theoretical
framework lives in
theory.md § Bayesian persuasion.
TL;DR — honest framing
We have ruled out crude per-row fabrication (AN-013), partisan bairro/stronghold selection (AN-032, preliminary), and the obvious data-quality alternatives (AN-015, AN-016). What remains in the candidate set is Channel A (design-driven Bayesian persuasion through disclosure-compliant methodology choices) alongside sophisticated Channel B variants the digit forensics don't reach and design dimensions registration doesn't surface — but the documented Channel A levers quantitatively underexplain the +7 pp headline.
The residual decomposition in
thinking.md § "Residual decomposition of the +7 pp"
puts the generous upper bound on documented Channel A levers
at 1–5 pp out of the +7 pp — scenario rotation 1–2 pp (and
walked back to firm-tier composition by AN-052),
population-reference mismatch 0–1 pp, ponderação 0–2 pp.
2–6 pp of the headline is unattributed. Wide CIs on the
single quantified lever (population frame point +4.6 pp, SE 3.7,
p = 0.21, n = 354) leave open scenarios where Channel A still
explains most of the +7 pp, but the point-estimate reading is
that Channel A is unlikely to be the main explanation.
The within-pair evidence splits into two qualitatively different findings:
- Concrete design-choice differences (real methodology levers that mechanically tilt the sample): population reference frame, coverage class, census-setor cluster usage. Directionally right but individually small and not adding up to +7 pp at point estimates.
- Opacity differences (sponsored polls are less documented, not differently designed): audit %, methodology completeness, coverage deferral. The statistically loudest within-pair contrasts — but opacity is the absence of documentation, not a mechanism by itself.
The honest read: documented Channel A levers are directionally consistent but quantitatively underexplain the +7 pp; the unattributed residual could live in Channel A's wide upper bounds, in sophisticated Channel B variants, or in design dimensions the registration regime doesn't surface. Until a specific design lever is shown to carry the slant quantitatively, or the queued probes below close the gap, the substantive answer is provisional and Channel A is one candidate, not the leading one.
The two-channel framing
The headline β = +7–8 pp can in principle decompose into two
mechanisms (per theory.md § Bayesian persuasion):
- Channel A — Bayesian persuasion through declared design. The sponsor chooses a pollster who chooses a disclosure-compliant methodology whose realized demographics mechanically favor the candidate. The slant is in the methodology, fully filed at registration.
- Channel B — residual / fabrication. The realized numbers deviate from what the disclosed methodology should produce. The slant cannot be traced through any registered field.
The policy interpretations differ: Channel A → constrain methodology choices; Channel B → fraud / enforcement regime. Below, the evidence rules out crude Channel B (digit forensics) but does not adjudicate decisively between Channel A and sophisticated/unobservable variants — documented Channel A levers are directionally right but quantitatively small, sophisticated Channel B variants remain untested, and the unattributed 2–6 pp residual leaves the enforcement-regime interpretation in the candidate set at non-trivial weight. Caveats explicit at the lever level below.
What we have ruled out
| Mechanism | Status | Evidence |
|---|---|---|
| Crude post-fielding tampering of sponsor's row | Ruled out | AN-013: within-sponsored-poll digit symmetry between sponsor's row and other-candidate rows; A vs B z-test p > 0.6 on every digit metric |
| Sponsors list fewer candidates → renormalization inflates share | Ruled out | AN-014: 98.6 % of the K2 raw-vs-renormalized gap is mechanical multiplicative scaling, not a denominator design choice |
| Differential LLM extraction quality | Ruled out (3 fronts) | AN-015 (proxies), AN-016 (within-firm 46 pp β spread under fixed PDF style) |
| Partisan bairro/setor selection (cherry-pick sponsor's strongholds) | Ruled out (preliminary) | AN-032: within-pair contrast = −0.0029 (t = −5.3) on the 22-pair muni-rich subset, direction opposite the prediction |
| Sophisticated Channel B (preserves digit distributions, proportional rescaling, pre-publication quota reweighting) | NOT ruled out | AN-013's digit forensics catch crude tampering only; the structured Channel A controls (once the full-universe extractor runs) will quantify the residual |
| Mode substitution (sponsored swap in-person for phone/online to reach a cheaper, less-representative sample) | Ruled out (preliminary) | AN-041: 0/244 sponsored polls use phone mode vs 10 % independent — sponsored side over-represents in-person. χ² p = 0.0003 in the opposite direction of the cheap-mode-slant prior. |
What we have as evidence for Channel A
Concrete design-choice differences (real, but small)
These are sponsored-vs-independent contrasts on specific methodology levers the pollster declares at registration. They correspond to mechanically tilting the realized sample.
| Lever | Direction | Magnitude | Source |
|---|---|---|---|
| Population reference frame (mixed vs canonical TSE-eligible) | sp > ind on mixed |
9 of 20 differing pairs sponsored-side mixed; 0 reverse — perfectly one-sided asymmetry |
findings-paired |
| Coverage class (urban-only / specific-neighborhoods) | sp > ind on slant-permissive coverage | +2 pp (12 % sp vs 10 % ind on n=200 methodology subset); 10 + 5 = 15 of 49 differing-coverage biased pairs in the right direction | AN-019, findings-paired |
| Census-setor as cluster frame (used / not) | sp > ind | 39 % of biased-pair pairs disagree on this vs 26 % well-behaved baseline (+13 pp gap) | findings-paired |
| Quota variable mix (which variables declared) | mixed | not decisive on the n=200 subset | AN-022 |
| sp < ind marginally, null within firm | AN-051 marginal 5.4 % vs 26.1 % (McNemar p ≈ 4 × 10⁻⁸) is a between-firm composition effect (AN-052 firm-FE LPM coef +0.025 pp, p = 0.37; 19 of 20 both-side firms have identical rotation rates). AN-053's direct candidate-position test refutes the priming-exploitation reading (sp lists candidate later; McNemar p = 0.001 in WRONG direction). | AN-051, AN-052, AN-053 |
These are real findings. They are in the predicted direction.
The combined regression preview (paper.tex § Extension in development)
gives sponsored × mixed_population interaction $= +4.6$ pp
(SE 3.7, p = 0.21) — directionally consistent, statistically
underpowered at the current $n=354$ extracted protocols. The
size-quantification of "how much of +7 pp does each lever
carry" is blocked on the full-universe LLM methodology
extraction (queued in todo.md § Mechanism
decomposition).
Opacity differences (loud, selective — not a mechanism by themselves)
These are measurement-quality signals about how documented the methodology is. They do not by themselves describe a tilt. Importantly, the opacity gap is not blanket — it is field-specific ("selective disclosure"). Sponsored polls under-document the dimensions that determine who is in the realized sample, but over-document the dimensions that signal craftwork.
| Signal | Direction (sp vs ind) | Magnitude | Source |
|---|---|---|---|
| Audit % | sp less documented | KS-test rejection | AN-021 |
| Methodology completeness index | sp less documented | t-test rejection on n=200 | AN-022 |
| Coverage deferred at registration | sp less documented (more deferred) | Universe-scale χ² with sharp odds ratio | AN-024 |
| Interviewer training described | sp more documented (+11.9 pp) | McNemar p=0.002 on 244 pairs | AN-042 |
| Supervisor role described | sp more documented (+5.3 pp) | McNemar p=0.08 on 244 pairs | AN-042 |
These tell us that sponsored polls are systematically less-documented on the sample-shape dimensions and more documented on the visible-rigor dimensions — a strategic-disclosure pattern, not a blanket opacity. The bias contrast in AN-042 does not co-vary with which side describes interviewer training (MW p = 0.58), so interviewer-side documentation is not a slant carrier. The selective-disclosure pattern is consistent with two hypotheses:
- Mechanism is hidden inside opacity — the lack of documentation IS where the slant happens. Sponsored polls don't describe their methodology because describing it would surface choices that wouldn't pass scrutiny, OR because the methodology is ad-hoc / non-standard in ways the boilerplate templates can't capture.
- Opacity is a quality proxy, mechanism is elsewhere — sponsored polls being less-documented may be a cost-saving or discipline pattern that correlates with bias without carrying it; the real mechanism is in a lever we haven't extracted yet.
Without distinguishing (1) from (2), reporting opacity as a "Channel A finding" would overclaim. The honest read leaves the question open.
The size-mismatch problem
Note (2026-06-02). The size-mismatch problem and the ruled-out levers below are now documented in
paper/paper.tex §5 "Extension in development"in the two paragraphs "Size mismatch and ruled-out levers" and "Selective disclosure". This section is the research-doc canonical version; the paper writeup is the external-audience version of the same content.
If the population-frame lever moves β by +4.6 pp on the LLM-extracted subset (the only currently-quantified concrete design lever), and coverage-class shifts are +2 pp on the subset, the documented design levers collectively account for less than half of the +7-8 pp pooled β. The remainder lives in either:
- The same levers measured at the full-universe scale (where power is sharper)
- Other design levers we haven't yet contrasted at quantitative scale
- Sophisticated Channel B variants below the digit-forensics resolution
- Genuinely the opacity gap: sponsored polls are slanted because the documentation regime doesn't compel them to describe the slanting steps
Until that gap is closed, the working answer is "documented Channel A levers are directionally consistent but quantitatively underexplain the +7 pp; the unattributed 2–6 pp residual could live in Channel A's wide upper bounds, in sophisticated Channel B variants the digit forensics don't reach, or in design dimensions the registration regime doesn't surface." In point-estimate terms, Channel A is unlikely to be the main explanation; in upper-bound terms it remains in the candidate set.
Candidate mechanisms left to probe
In rough order of (a) testability with what we have and (b) plausible mechanistic bite. Each of these would help close the size-mismatch problem if positive evidence shows up; if all return null, the "opacity is genuinely the answer" reading becomes well-earned.
Within-poll scenario selection— refuted by AN-051 on the 241-pair sample. Sponsored polls file fewer vote- intention scenarios on average (3.73 vs 4.49, Wilcoxon p = 0.002), not more — the opposite direction of the cherry- pick prior. Only 6 pairs have sponsored ≥ 8 scenarios and their mean bias is below the overall mean.- Question / name-order priming — ABSORBED into the firm-tier discipline gradient. AN-051 landed a sharp marginal contrast (sp 5.4 % vs ind 26.1 % rotation documented; McNemar p ≈ 4 × 10⁻⁸), but three follow-ups jointly walked it back: AN-052 — contrast vanishes within firm (firm-FE LPM p = 0.37); AN-053 — sponsored polls list the sponsor's candidate later, not earlier (McNemar p = 0.001 in the wrong direction); and AN-054 — firm-level rotation rate is 3.7 % / 15.2 % / 19.4 % across small / medium / large firm-volume tertiles on the broad 124-firm panel, a monotonic 5× gradient matching AN-018's discipline curve. The lever is sponsor choice of low-volume, low-discipline firms (AN-018 mechanism), not within-firm sponsor-side manipulation of questionnaire design.
Non-response / undecided handling— null-by-data-design on the methodology extraction (AN-043). The structurednonresponse_handlingfield is 100 %not_specifiedon all 488 pair-sides, and the diagnostic free-text grep confirms the vocabulary is virtually absent (5 + 2 hits out of 488 on the undecided/refusal vocabulary family). The reason is structural: TSE registration PDFs are pre-fielding planning documents, while nonresponse handling is a post-fielding analytical choice. To actually test this lever requires extending the LLM pipeline to ingest the post-fielding relatório PDFs (a separate document class). Escalated to "needs relatório extraction pipeline".- Weighting / post-stratification — extract the actual weight
structure from
DS_PLANO_AMOSTRAL. Sponsored polls might apply less aggressive demographic weighting (so non-representative samples don't get corrected back to population norms). Mode / data-collection device— ruled out by AN-041 on the 244-pair curated sample. Sponsored polls over-use in-person mode (95 % vs 89 %) and never use phone (0 vs 24); the χ² rejects mode independence at p = 0.0003 in the opposite direction of the cheap-mode-slant prior. The mode-substitution lever does not carry sponsor bias. (Mode contributes to opacity via a 5 % sponsored-side "not_specified" rate vs 1 % independent — but that is the opacity gap, not a design substitution.)Interviewer training / supervision detail— ruled out by AN-042 on the 244-pair sample. Sponsored polls describe interviewer training more (84.4 % vs 72.5 %, McNemar p=0.002) and supervisor role slightly more (92.6 % vs 87.3 %, p=0.08). Bias contrast does not co-vary with which side describes (MW p ≈ 0.58 both fields), so interviewer-side documentation is not a slant carrier. This refutes blanket opacity and forces the selective-disclosure reading folded into the opacity-differences table above.
Each of (1)–(5) is testable with existing data + an additional LLM extraction pass. (6) requires either an interview-protocol experiment we cannot run or a structural argument that the not-documented part of poll production is where the bias necessarily lives.
Open questions (the agenda)
| Question | Status |
|---|---|
| Channel A regression decomposition at full magnitude | Blocked on full-universe LLM methodology extraction (~14k mayoral polls; Batch API path implemented, queued). Will give magnitudes per lever and the residual to Channel B. |
| Question-order / name-order priming (extract from questionario_pesquisa PDFs) | Pilot landed AND walked back (AN-051 → AN-052 → AN-053, 2026-06-02). Marginal contrast sharp (sp 5.4 % vs ind 26.1 % rotation documented; McNemar p ≈ 4 × 10⁻⁸); within-firm FE attenuates to +0.025 pp (p = 0.37); direct candidate-position test refutes priming (sp lists candidate LATER, McNemar p = 0.001). Rotation is a firm-tier signal that maps onto the AN-018 discipline gradient, not a within-firm Channel-A lever. Universe-scale extraction would still be informative but the headline framing has shifted. |
| Non-response handling × sponsor | Null-by-data-design on registration PDFs (AN-043) — both sides 100 % not_specified, vocabulary virtually absent in free-text grep (5+2 / 488). To test, need a separate LLM extraction pipeline over post-fielding relatório PDFs. |
| Weighting / post-stratification structured extraction | Not yet specified as an LLM task; needs schema design. |
| Mode × sponsor quantitative contrast | Done — AN-041, mode-substitution refuted (0 % phone on sponsored vs 10 % on independent; χ² p=0.0003 in the opposite direction). |
| Sophisticated Channel B (preserves digits) | Not testable with the existing forensics; would need an external comparator (e.g., a forensic re-poll on a sample). |
Most of these are within reach with the LLM extraction pipeline already in production. The decisive Channel-A magnitude-decomposition regression is queued behind the full-scale extractor run.
Document hygiene
This is a living synthesis. When a new mechanism analysis lands,
revise the relevant table here, then update
working-hypothesis.md §3 (which should
remain a short summary pointing back to this doc). If a candidate
mechanism in the probe list above returns a sharp positive signal,
move it into the "concrete design-choice differences" table and
note the magnitude. If all probes return null and the size-mismatch
problem persists, the "opacity is genuinely the answer" reading
moves from open to settled — and that itself becomes a substantive
finding worth a separate writeup.