H3: β shrinks substantially when methodology features are controlled for
The headline +7 pp sponsor bias documented in H1
can in principle arise either through declared methodology choices —
sample frame, coverage class, quota variables, weighting structure
(Channel A: Bayesian persuasion through the disclosed signal structure) —
or through residual movement outside the disclosed methodology (Channel B:
fabrication, opacity, or design slant on dimensions the regime does not
require disclosing). H3 is the decomposition test: total β = β^A + β^B.
Adding structured methodology controls (sample size, field-period
length, ST_PESQUISA_PROPRIA) and LLM-extracted methodology features
(coverage_class, quota variables, population frame) to the H1 spec
should shrink β substantially if Channel A is the dominant mechanism.
Evidence strength: Mixed; decisive test pending (2026-06-18). Six descriptives on the n=200 methodology LLM subset and one universe- scale test on coverage deferral (AN-024, n=14,876) show Channel A running null or wrong-signed on every measured lever — including methodology completeness in the opposite direction (cand-touched 0.43 vs independent 0.39). The headline shrinkage regression (β₁ → β₃ with full LLM controls) is blocked on the
poll_methodologyuniverse extraction queued inpipelines/politica/docs/todo.md.
Theory
The framework is Polls as Bayesian persuasion (theory.md §"Polls as
Bayesian persuasion (supply-side / Channel A)"). The sender commits to
a signal structure σ — operationalized as the poll's declared
methodology — and the disclosure regime makes that commitment public.
Quota sampling with multi-stage selection is the latitude-giving
feature: it is universal in Brazilian electoral polls, model-based
rather than design-based, and the chosen quota variables and population
frame are the levers Penadés et al. (2024); the statute names
five dimensions in LE.33.IV (sex, age, education, income, area) but
mandates none — only their disclosure (institutions.md §"Statutory
mapping"). β^A is the share of the headline gap that lives inside this
declared signal structure; β^B is everything else.
Prediction
A triple specification — β₁ (H1 baseline) vs β₂ (adds structured
methodology controls) vs β₃ (adds LLM-extracted coverage_class, quota
variables list, population frame) — recovers (β₁ − β₃) as the Channel A
share. If Channel A dominates, β₃ should be a small fraction of β₁
(i.e., β shrinks from ≈ +7 pp toward a much smaller residual β^B).
If Channel B dominates, β₃ stays close to β₁ and the disclosed
methodology absorbs little of the gap.
Competing predictions
Channel B dominance. Residual fabrication, opacity, or slant on dimensions outside the disclosed methodology carries the load. Under this reading β stays roughly constant across the three specs, and the decay near the election in H5 is the sharper signature. Current evidence on the n=200 subset (six descriptives, AN-019 through AN-024) and at universe scale on deferral (AN-024) is consistent with this reading more than with Channel A.
Unobserved-design slant. Channel A could still dominate even if observed methodology features absorb little of β — if the operative levers are dimensions the LLM schema does not yet extract (non-response handling, weighting structure detail, mode, question-order priming, interviewer-supervision protocols). The decomposition is identified only up to the observable methodology features; sharpening the extraction dictionary is part of the open agenda (theory.md §"Open testability concern").
Prior research
The closest econometric template is the firm-level audit of Brazilian
sample-design predictors of poll error Meireles & Russo (2022),
which documents systematic deviations but does not interact with
sponsor identity. Two anecdotes establish that the LE.33.IV quota
dimensions function as live Channel A levers in litigation:
Russomanno's 2020 SP campaign obtained an injunction censoring a
Datafolha poll on the grounds that the plano amostral omitted income
and used a coarse two-bin education split [stories.csv #131; #132];
and the PT-led 2022 Bahia gubernatorial coalition censored a Datafolha
poll (rádio Metrópole contratante) on identical grounds — missing
gender/age/education/income/area weighting in the plano amostral
[stories.csv #078]. The Bahia injunction was subsequently revoked on
procedural grounds, but the substantive critique was the
methodology-disclosure dimensions named in LE.33.IV. No prior work
quantifies the Channel A vs Channel B split registry-wide.
Evidence
| Analysis | Bearing | Key takeaway |
|---|---|---|
| AN-019 | Mixed | n=200 methodology subset: slant-permissive coverage classes (specific_neighborhoods + urban_only) at 12% in candidate-touched polls vs 10% in independent. Direction matches Channel A but n_candidate=25 makes the gap noisy. |
| AN-020 | Mixed (qualitative) | n=200, finer split. Committee polls (n=6) are 83% deferred-to-complement; party-route polls (n=4) are 50% specific_neighborhoods — small-n hint that party sponsors use the active coverage-restriction lever. |
| AN-021 | Mixed | n=200 audit_pct distribution. KS p = 1.00; ~76% of every bucket sits at the 20% legal floor. Candidate-touched polls (n=24) never exceed 30% audit; independent polls reach 100%. Right-tail gap is qualitative not statistical. |
| AN-022 | Against | n=200 completeness index. Candidate-touched mean 0.43 vs independent 0.39 — opposite of Channel A's "candidates hide methodology" prediction. t = +1.25, p = 0.22. |
| AN-024 | Against (significant) | Universe-scale n=14,876. Candidate-touched defer at 35.5% vs independent at 38.3%. Odds ratio 0.89 (95% CI [0.80, 0.98]), chi-square p = 0.021. "Candidates hide coverage via deferral" is statistically refuted at universe scale. |
| AN-033 | Against (null) | Spec B complement of AN-024 on the AN-001 analysis sample (n=27,907). Within-candidate FE: γ on sponsored_by × deferred is +1.08 (SE 2.08, p=0.60); flips to −5.63 under race × week FE. Sign-inconsistent, never significant. |
| AN-040 | Against (null) | 3-way sponsored × deferred × I(final_rank=1) is null across all four specs (+2.03 / +4.09 / +0.46 / −12.51). Closes the deferral lever — neither selection, pooled amplification, nor rank-conditional amplification. |
Open tests
Headline shrinkage regression
The decisive test is the triple-spec β₁ → β₂ → β₃ ladder applied to
the analysis panel, with β₃ adding the full LLM-extracted methodology
bundle (coverage_class, quota variables list, population frame,
census-setor cluster usage). Magnitude of (β₁ − β₃) is the Channel A
share; residual β₃ is the Channel B floor. Blocked on the queued
poll_methodology extractor in pipelines/politica/docs/todo.md — the
single highest-leverage open lever (docs/decisions.md 2026-06-02
§"Promote from idea to project"). Structured registry fields are
already in hand; the four free-text fields (DS_PLANO_AMOSTRAL,
DS_METODOLOGIA_PESQUISA, DS_SISTEMA_CONTROLE, DS_DADO_MUNICIPIO)
need the universe extraction.
Extraction-schema sharpening
The decomposition is identified only up to the observable methodology
features. Even a null β₁ − β₃ leaves open that Channel A operates
through dimensions the schema does not currently extract — non-response
handling, weighting structure detail, mode, question-order priming,
interviewer-supervision protocols. AN-041, AN-042, AN-043 (mode,
interviewer training, non-response) are the natural next probes once
the universe schema lands. See docs/source-of-bias.md for the lever
inventory and the size-mismatch problem (measured magnitudes of
documented levers do not yet add up to +7 pp).
Interaction with methodology flexibility
If H3 finds meaningful β^A, H4 sharpens it: β^A should be larger where the menu has more room to slant (rural-heavy munis, demographically heterogeneous munis). Queued behind H3 and behind IBGE Censo 2022 muni demographics.