Universe-scale n=14,876. Candidate-touched polls defer at 35.5% vs independent at 38.3% — candidate-touched are LESS likely to defer (odds ratio 0.89, 95% CI [0.80, 0.98], chi-square p = 0.021). The "candidates hide coverage via deferral" hypothesis is statistically refuted at universe scale.
Question
AN-019 found ~55% deferred in independent polls and 56% in candidate- touched (n=200 subset — essentially identical rates). AN-020 found committees are 83% deferred (n=6). Deferral could be either:
- Industry-wide boilerplate — pollsters submit a separate complementary methodology document by convention, regardless of sponsor. Then deferral rate is constant across sponsor types.
- Sponsor-specific tell — candidate-touched polls disproportionately defer to hide specific coverage choices behind a less-scrutinized complementary doc.
The universe-scale 2×2 test pins this down with proper power (n_candidate ≈ 800 expected).
Design
Per-protocol classification from the cov_bucket scan + sponsor parquet:
- defer: cov_bucket == "deferred_complement"
- not_defer: cov_bucket ∈ {substantive, very_short, empty}
- candidate_touched: protocol has any sponsor row with route ∈ {cpf, committee, party, party_name} OR id_type == CPF
- independent: only media / pollster_self sponsors
- other: residual
Two-by-two chi-square + odds ratio on the candidate-vs-independent cells. Drop "other" for the headline test.
Results
Universe-scale 14,876 mayoral protocols:
| sponsor_bucket | n | n_defer | defer_rate |
|---|---|---|---|
| candidate_touched | 1,928 | 684 | 35.5% |
| independent | 9,502 | 3,637 | 38.3% |
| other | 3,446 | 1,163 | 33.7% |
2×2 chi-square (candidate-touched vs independent):
- Chi-square = 5.34, df=1, p = 0.021
- Odds ratio = 0.887, 95% CI [0.801, 0.982]
- Defer-rate difference: −2.8 pp
Interpretation
With proper power at universe scale, candidate-touched polls are LESS likely to defer than independent polls, by 2.8 percentage points (OR = 0.89, p = 0.021). The simple Channel A subprediction "candidate-touched polls hide coverage by deferring to a complementary document" is statistically refuted.
This sharpens the cumulative finding from D1-D6:
- AN-019 (D1): coverage_class similar across sponsor types
- AN-020 (D2): committees defer, party-route picks selective (but cells too thin)
- AN-021 (D3): audit_pct identical (KS p=1.00)
- AN-022 (D4): completeness HIGHER for candidate-touched (wrong-signed)
- AN-023 (D5): pollster fingerprint uncorrelated with customer mix
- AN-024 (D6): deferral LOWER for candidate-touched (significantly)
Across every measured methodology lever, the Channel A "candidates minimize/hide methodology" prediction is either null or wrong-signed. The +7 pp sponsor bias estimated in AN-001 must be operating through something the LLM methodology extraction did not capture:
- Channel B (residual / fabrication) — interviewer-level shading that wouldn't show up in any disclosed methodology field. The day-to-election decay (AN-005) hints at this — slant shrinks toward the verification event.
- Channel A via levers not measured — quota distributions, actual rural sub-district selection, weighting choices in the complement document. These are below the resolution of the current LLM schema.
- Quota distribution slant inside a constant menu — 88% of pollsters use {sex, age, education, income} quotas, but the bin shares within those quotas vary. A poll that quotas to a demographically-favorable distribution (e.g., young voters over-sampled when the candidate skews young) slants without touching coverage class or audit.
This sets the agenda for the Spec 3 regression on the universe LLM extract: the β shrinkage will likely be small when the methodology features land. The interesting follow-up will be Channel B diagnostics, not Channel A.
Follow-ups
- Quota-distribution slant test (extension): parse the
sampling__quota_distributionsJSON in the LLM extract once universe-scale, compare each poll's bin shares to IBGE Census 2022 reference for the muni. A poll that quotas to a demographically-shifted distribution is doing Channel A via a lever AN-019-AN-024 don't capture. - Day-to-election decay × sponsor type (extension): AN-005 showed β decays toward election day. Is that decay larger for the audit-pct/coverage-deferral-controlled subset? That would tighten the Channel B story.
- Funding-source disclosure × β (blind spot): only 9% of
polls mention funding source (AN-022 fields). The
DS_ORIGEM_RECURSOflag in the registry is universal. Does the handful of polls that also mention funding voluntarily have smaller β? That would index a self-selection on transparency. - Update theory.md § Channel A vs B framing (blind spot):
the project's
docs/theory.md§ "Polls as Bayesian persuasion" currently treats Channel A as the leading hypothesis. The universe-scale D1-D6 results justify pre-emptively weakening that framing — Channel B should at least be co-equal.