H3: β shrinks substantially when methodology features are controlled for

The headline +7 pp sponsor bias documented in H1 can in principle arise either through declared methodology choices — sample frame, coverage class, quota variables, weighting structure (Channel A: Bayesian persuasion through the disclosed signal structure) — or through residual movement outside the disclosed methodology (Channel B: fabrication, opacity, or design slant on dimensions the regime does not require disclosing). H3 is the decomposition test: total β = β^A + β^B. Adding structured methodology controls (sample size, field-period length, ST_PESQUISA_PROPRIA) and LLM-extracted methodology features (coverage_class, quota variables, population frame) to the H1 spec should shrink β substantially if Channel A is the dominant mechanism.

Evidence strength: Mixed; decisive test pending (2026-06-18). Six descriptives on the n=200 methodology LLM subset and one universe- scale test on coverage deferral (AN-024, n=14,876) show Channel A running null or wrong-signed on every measured lever — including methodology completeness in the opposite direction (cand-touched 0.43 vs independent 0.39). The headline shrinkage regression (β₁ → β₃ with full LLM controls) is blocked on the poll_methodology universe extraction queued in pipelines/politica/docs/todo.md.

Theory

The framework is Polls as Bayesian persuasion (theory.md §"Polls as Bayesian persuasion (supply-side / Channel A)"). The sender commits to a signal structure σ — operationalized as the poll's declared methodology — and the disclosure regime makes that commitment public. Quota sampling with multi-stage selection is the latitude-giving feature: it is universal in Brazilian electoral polls, model-based rather than design-based, and the chosen quota variables and population frame are the levers Penadés et al. (2024); the statute names five dimensions in LE.33.IV (sex, age, education, income, area) but mandates none — only their disclosure (institutions.md §"Statutory mapping"). β^A is the share of the headline gap that lives inside this declared signal structure; β^B is everything else.

Prediction

A triple specification — β₁ (H1 baseline) vs β₂ (adds structured methodology controls) vs β₃ (adds LLM-extracted coverage_class, quota variables list, population frame) — recovers (β₁ − β₃) as the Channel A share. If Channel A dominates, β₃ should be a small fraction of β₁ (i.e., β shrinks from ≈ +7 pp toward a much smaller residual β^B). If Channel B dominates, β₃ stays close to β₁ and the disclosed methodology absorbs little of the gap.

Competing predictions

Channel B dominance. Residual fabrication, opacity, or slant on dimensions outside the disclosed methodology carries the load. Under this reading β stays roughly constant across the three specs, and the decay near the election in H5 is the sharper signature. Current evidence on the n=200 subset (six descriptives, AN-019 through AN-024) and at universe scale on deferral (AN-024) is consistent with this reading more than with Channel A.

Unobserved-design slant. Channel A could still dominate even if observed methodology features absorb little of β — if the operative levers are dimensions the LLM schema does not yet extract (non-response handling, weighting structure detail, mode, question-order priming, interviewer-supervision protocols). The decomposition is identified only up to the observable methodology features; sharpening the extraction dictionary is part of the open agenda (theory.md §"Open testability concern").

Prior research

The closest econometric template is the firm-level audit of Brazilian sample-design predictors of poll error Meireles & Russo (2022), which documents systematic deviations but does not interact with sponsor identity. Two anecdotes establish that the LE.33.IV quota dimensions function as live Channel A levers in litigation: Russomanno's 2020 SP campaign obtained an injunction censoring a Datafolha poll on the grounds that the plano amostral omitted income and used a coarse two-bin education split [stories.csv #131; #132]; and the PT-led 2022 Bahia gubernatorial coalition censored a Datafolha poll (rádio Metrópole contratante) on identical grounds — missing gender/age/education/income/area weighting in the plano amostral [stories.csv #078]. The Bahia injunction was subsequently revoked on procedural grounds, but the substantive critique was the methodology-disclosure dimensions named in LE.33.IV. No prior work quantifies the Channel A vs Channel B split registry-wide.

Evidence

Analysis Bearing Key takeaway
AN-019 Mixed n=200 methodology subset: slant-permissive coverage classes (specific_neighborhoods + urban_only) at 12% in candidate-touched polls vs 10% in independent. Direction matches Channel A but n_candidate=25 makes the gap noisy.
AN-020 Mixed (qualitative) n=200, finer split. Committee polls (n=6) are 83% deferred-to-complement; party-route polls (n=4) are 50% specific_neighborhoods — small-n hint that party sponsors use the active coverage-restriction lever.
AN-021 Mixed n=200 audit_pct distribution. KS p = 1.00; ~76% of every bucket sits at the 20% legal floor. Candidate-touched polls (n=24) never exceed 30% audit; independent polls reach 100%. Right-tail gap is qualitative not statistical.
AN-022 Against n=200 completeness index. Candidate-touched mean 0.43 vs independent 0.39 — opposite of Channel A's "candidates hide methodology" prediction. t = +1.25, p = 0.22.
AN-024 Against (significant) Universe-scale n=14,876. Candidate-touched defer at 35.5% vs independent at 38.3%. Odds ratio 0.89 (95% CI [0.80, 0.98]), chi-square p = 0.021. "Candidates hide coverage via deferral" is statistically refuted at universe scale.
AN-033 Against (null) Spec B complement of AN-024 on the AN-001 analysis sample (n=27,907). Within-candidate FE: γ on sponsored_by × deferred is +1.08 (SE 2.08, p=0.60); flips to −5.63 under race × week FE. Sign-inconsistent, never significant.
AN-040 Against (null) 3-way sponsored × deferred × I(final_rank=1) is null across all four specs (+2.03 / +4.09 / +0.46 / −12.51). Closes the deferral lever — neither selection, pooled amplification, nor rank-conditional amplification.

Open tests

Headline shrinkage regression

The decisive test is the triple-spec β₁ → β₂ → β₃ ladder applied to the analysis panel, with β₃ adding the full LLM-extracted methodology bundle (coverage_class, quota variables list, population frame, census-setor cluster usage). Magnitude of (β₁ − β₃) is the Channel A share; residual β₃ is the Channel B floor. Blocked on the queued poll_methodology extractor in pipelines/politica/docs/todo.md — the single highest-leverage open lever (docs/decisions.md 2026-06-02 §"Promote from idea to project"). Structured registry fields are already in hand; the four free-text fields (DS_PLANO_AMOSTRAL, DS_METODOLOGIA_PESQUISA, DS_SISTEMA_CONTROLE, DS_DADO_MUNICIPIO) need the universe extraction.

Extraction-schema sharpening

The decomposition is identified only up to the observable methodology features. Even a null β₁ − β₃ leaves open that Channel A operates through dimensions the schema does not currently extract — non-response handling, weighting structure detail, mode, question-order priming, interviewer-supervision protocols. AN-041, AN-042, AN-043 (mode, interviewer training, non-response) are the natural next probes once the universe schema lands. See docs/source-of-bias.md for the lever inventory and the size-mismatch problem (measured magnitudes of documented levers do not yet add up to +7 pp).

Interaction with methodology flexibility

If H3 finds meaningful β^A, H4 sharpens it: β^A should be larger where the menu has more room to slant (rural-heavy munis, demographically heterogeneous munis). Queued behind H3 and behind IBGE Censo 2022 muni demographics.