AN-024: Is deferral itself a sponsor-specific tell?

Universe-scale n=14,876. Candidate-touched polls defer at 35.5% vs independent at 38.3% — candidate-touched are LESS likely to defer (odds ratio 0.89, 95% CI [0.80, 0.98], chi-square p = 0.021). The "candidates hide coverage via deferral" hypothesis is statistically refuted at universe scale.

Hypothesis: H4: Channel A contribution is larger where methodology flexibility is greater
Confidence: green
Type: descriptive

Design

Sample: 14,876 mayoral protocols (universe-scale via cov_bucket classifier)
Specification: 2×2 table of cov_bucket (deferred_complement vs everything else) × sponsor_bucket (candidate_touched vs independent). Chi-square + odds ratio.
Notes: D6 of six. Promotes the deferral question (AN-019/020 saw 55% deferred in independent and 83% in committees) from the n=200 LLM subset to the full universe. cov_bucket is universe-deterministic, doesn't need LLM.

Script: source/analysis/an-024-coverage-deferral-by-sponsor.py
Target: build/table/an-024-coverage-deferral-by-sponsor.csv
Status: interpreted · 2026-06-02
Created: 2026-06-02

Question

AN-019 found ~55% deferred in independent polls and 56% in candidate- touched (n=200 subset — essentially identical rates). AN-020 found committees are 83% deferred (n=6). Deferral could be either:

Industry-wide boilerplate — pollsters submit a separate complementary methodology document by convention, regardless of sponsor. Then deferral rate is constant across sponsor types.
Sponsor-specific tell — candidate-touched polls disproportionately defer to hide specific coverage choices behind a less-scrutinized complementary doc.

The universe-scale 2×2 test pins this down with proper power (n_candidate ≈ 800 expected).

Design

Per-protocol classification from the cov_bucket scan + sponsor parquet:

defer: cov_bucket == "deferred_complement"
not_defer: cov_bucket ∈ {substantive, very_short, empty}
candidate_touched: protocol has any sponsor row with route ∈ {cpf, committee, party, party_name} OR id_type == CPF
independent: only media / pollster_self sponsors
other: residual

Two-by-two chi-square + odds ratio on the candidate-vs-independent cells. Drop "other" for the headline test.

Results

Universe-scale 14,876 mayoral protocols:

sponsor_bucket	n	n_defer	defer_rate
candidate_touched	1,928	684	35.5%
independent	9,502	3,637	38.3%
other	3,446	1,163	33.7%

2×2 chi-square (candidate-touched vs independent):

Chi-square = 5.34, df=1, p = 0.021
Odds ratio = 0.887, 95% CI [0.801, 0.982]
Defer-rate difference: −2.8 pp

Interpretation

With proper power at universe scale, candidate-touched polls are LESS likely to defer than independent polls, by 2.8 percentage points (OR = 0.89, p = 0.021). The simple Channel A subprediction "candidate-touched polls hide coverage by deferring to a complementary document" is statistically refuted.

This sharpens the cumulative finding from D1-D6:

AN-019 (D1): coverage_class similar across sponsor types
AN-020 (D2): committees defer, party-route picks selective (but cells too thin)
AN-021 (D3): audit_pct identical (KS p=1.00)
AN-022 (D4): completeness HIGHER for candidate-touched (wrong-signed)
AN-023 (D5): pollster fingerprint uncorrelated with customer mix
AN-024 (D6): deferral LOWER for candidate-touched (significantly)

Across every measured methodology lever, the Channel A "candidates minimize/hide methodology" prediction is either null or wrong-signed. The +7 pp sponsor bias estimated in AN-001 must be operating through something the LLM methodology extraction did not capture:

Channel B (residual / fabrication) — interviewer-level shading that wouldn't show up in any disclosed methodology field. The day-to-election decay (AN-005) hints at this — slant shrinks toward the verification event.
Channel A via levers not measured — quota distributions, actual rural sub-district selection, weighting choices in the complement document. These are below the resolution of the current LLM schema.
Quota distribution slant inside a constant menu — 88% of pollsters use {sex, age, education, income} quotas, but the bin shares within those quotas vary. A poll that quotas to a demographically-favorable distribution (e.g., young voters over-sampled when the candidate skews young) slants without touching coverage class or audit.

This sets the agenda for the Spec 3 regression on the universe LLM extract: the β shrinkage will likely be small when the methodology features land. The interesting follow-up will be Channel B diagnostics, not Channel A.

Follow-ups

Quota-distribution slant test (extension): parse the sampling__quota_distributions JSON in the LLM extract once universe-scale, compare each poll's bin shares to IBGE Census 2022 reference for the muni. A poll that quotas to a demographically-shifted distribution is doing Channel A via a lever AN-019-AN-024 don't capture.
Day-to-election decay × sponsor type (extension): AN-005 showed β decays toward election day. Is that decay larger for the audit-pct/coverage-deferral-controlled subset? That would tighten the Channel B story.
Funding-source disclosure × β (blind spot): only 9% of polls mention funding source (AN-022 fields). The DS_ORIGEM_RECURSO flag in the registry is universal. Does the handful of polls that also mention funding voluntarily have smaller β? That would index a self-selection on transparency.
Update theory.md § Channel A vs B framing (blind spot): the project's docs/theory.md § "Polls as Bayesian persuasion" currently treats Channel A as the leading hypothesis. The universe-scale D1-D6 results justify pre-emptively weakening that framing — Channel B should at least be co-equal.