AN-055: Coverage class × candidate-base interaction (cheap Tier 2)

Cheap-Tier-2 structural test on coverage × candidate-base finds **no triple-interaction signal**. On the universe-scale within-candidate FE sample (n=20,393 cand-poll rows, 3,524 candidates, 1,665 muni clusters), the headline sponsor effect lands at **+7.70 pp (SE 1.44, p<10⁻⁷)**, but the flat `sponsored × narrow_coverage` interaction is null (β = +1.61, SE 4.93, p=0.74) and the wash-out-breaking triple `sponsored × narrow_coverage × base_lv_size_weighted` is also null (β = −0.54, SE 1.35, p=0.69). The 95 % CI on the triple is [−3.2, +2.1] in units of pp per unit of base_lv_dm — rules out large triples but not modest ones. Read: coverage class is unlikely to be the dominant Channel-A lever for the headline +7 pp; consistent with AN-032's reversed-sign bairro test. Directs attention to weighting / income-quota features (next AN).

Hypothesis: H4: Channel A contribution is larger where methodology flexibility is greater
Confidence: yellow
Type: descriptive

Design

Sample: 20,393 candidate-poll rows from build/assemble/cand_poll.parquet after dropping (a) candidates whose 2020 base profile is unavailable (no own 2020 prefeito candidacy AND no party 2020 prefeito / vereador run in muni — 33 % of the panel), (b) protocols without a poll_coverage extraction (0 % loss; universe-scale 14,876 protocols all have coverage_class), and (c) candidates appearing in only one poll (within-cand FE requires ≥ 2). 3,524 candidates, 1,665 muni clusters.
Specification: Within-candidate FE (PanelOLS, entity_effects=True, clusters=muni_id). Three nested specs: (Spec 1) `error ~ sponsored` — headline replicate on the analysis slice; (Spec 2) `error ~ sponsored + narrow_coverage + sponsored×narrow_coverage` — the flat AN-019-style test; (Spec 3) `error ~ sponsored + narrow_coverage + sponsored×narrow + sponsored×base_lv_dm + narrow×base_lv_dm + sponsored×narrow×base_lv_dm` — the triple interaction. `narrow_coverage = 1[coverage_class ∈ {urban_only, specific_neighborhoods}]`. `base_lv_dm` = candidate's base_lv_size_weighted demeaned across the analysis sample (mean 21.6). base_lv_dm main effect is candidate-level → absorbed by cand FE; the two two-way interactions and the triple are identified from within-cand variation in sponsored and narrow_coverage.
Comparator: independent-media or pollster-self polls of the same candidate (the within-cand FE design defines the comparator implicitly)
Cluster: muni_id
Weights: unweighted

Script: source/analysis/an-055-coverage-by-cand-base.py
Target: build/table/an-055-coverage-by-cand-base.csv
Status: interpreted · 2026-06-14
Created: 2026-06-14

Question

The blinded LLM-judge pilot (blinded Channel-A discovery brief) flagged coverage and weighting as the dominant high-plausibility mechanism domains, with 14/16 high-plausibility hypotheses agreeing with the actual sponsored side (87.5 %, p ≈ 0.004). The flat structural test sponsored × coverage_class (AN-019) was underpowered and direction-ambiguous: rural-base candidates' sponsors might choose rural-friendly coverage and urban-base candidates' sponsors might choose urban-only coverage, so a directional coverage effect can wash out in the aggregate.

This analysis breaks the wash-out by interacting the coverage choice with each candidate's natural base concentration. The cheap proxy is base_lv_size_weighted = vote-weighted average of "seções per local_votacao" across the candidate's 2020 base:

Higher value → urban-leaning base (urban LVs serve many seções)
Lower value → rural-leaning base (rural LVs serve few seções)

If sponsored polls choose narrow (urban-only / specific-neighborhood) coverage when their candidate's base concentrates in dense urban LVs, the triple interaction sponsored × narrow_coverage × base_lv_size_weighted should be positive.

Design

source/analysis/an-055-coverage-by-cand-base.py:

Load build/assemble/cand_poll.parquet (with the base profile columns piped through from build/assemble/cand.parquet).
Load coverage_class per protocol from the cached poll_coverage LLM extractions (14,876 protocols at universe scale — both LLM-extracted and deterministic short-circuits).
Define narrow_coverage = 1[coverage_class ∈ {urban_only, specific_neighborhoods}].
Restrict to candidates with a non-unavailable base_source and to candidates appearing in ≥ 2 polls. Demean base_lv_size_weighted for interaction interpretability.
Fit three nested PanelOLS specs with within-candidate fixed effects and muni-clustered SEs.

The base-profile build is documented in source/intermediate/cand__base_profile.py and follows the fallback ladder: own 2020 prefeito vote → party 2020 prefeito vote in same muni → party 2020 vereador vote → unavailable.

Results

Statistic	Value
Sample (cand-poll rows)	20,393
Candidates	3,524
Muni clusters	1,665
β_sponsored	+7.70 pp
SE (muni-clustered)	1.44
t	5.35
p	9.0 × 10⁻⁸

The +7-8 pp sponsor effect from the headline analysis lands here at +7.70 pp on a strict slice (within-cand FE + base profile available + coverage extraction available). No drift.

Coefficient	β	SE	t	p
sponsored	+7.56	1.44	5.25	1.5 × 10⁻⁷
narrow_coverage	−0.86	0.58	−1.47	0.140
sponsored × narrow_coverage	+1.61	4.93	0.33	0.743

The differential sponsor effect inside narrow-coverage polls is not statistically distinguishable from zero. This is the wash-out target. The interpretation is direction-ambiguous on its own: it can mean (a) no coverage channel exists, or (b) the coverage channel exists but directional alignment cancels in the aggregate.

Spec 3 — triple interaction (the wash-out-breaking test)

Coefficient	β	SE	t	p
sponsored	+6.29	1.69	3.72	2.0 × 10⁻⁴
narrow_coverage	−0.91	0.58	−1.57	0.117
sponsored × narrow_coverage	−7.98	22.44	−0.36	0.722
sponsored × base_lv_dm	−0.097	0.11	−0.92	0.357
narrow_coverage × base_lv_dm	−0.0073	0.0080	−0.92	0.359
sponsored × narrow × base_lv_dm	−0.54	1.35	−0.40	0.689

The triple is also null. Point estimate is small and slightly negative; 95 % CI on the triple is approximately [−3.2, +2.1] pp per unit of base_lv_dm.

The candidate-level main effect of base_lv_dm is absorbed by the within-cand FE (it's constant per politico_id by construction). All three interactions are identified from within-candidate variation in sponsored and narrow_coverage.

Interpretation

What the triple null does and does not rule out

The 95 % CI on the triple (±1.4 pp per unit of base_lv_dm) — over the inter-quartile range of base_lv_dm (roughly 4 to 30 in the analysis sample, so a swing of about 25 units) — gives a confidence band on the implied within-pair coverage-bias effect of approximately ±35 pp. This is wide. We can rule out very large coverage × base alignments, but a modest one (say, a 5-pp differential between urban-base and rural-base candidates in narrow-coverage polls) sits comfortably inside the CI. This is not a precise null in the strict sense.

What it does say: at this proxy granularity (base_lv_size_weighted from 2020 seção votes), there is no evidence the structural Channel A mechanism runs through coverage × candidate-base alignment at universe scale.

Two readings survive

(R1) Coverage isn't the dominant Channel A lever. Consistent with AN-032 (bairro partisan composition reversed sign), AN-019 (small noisy positive), AN-024 (deferral wrong-signed). The +7.7 pp sponsor effect channels through some other mechanism — most likely weighting, income-quota distributions, or scenario rotation (AN-051's already-flagged 26 × under-documentation of name rotation in sponsored polls). The LLM-judge brief flagged cotas de renda / ponderação por renda / cobertura geográfica detalhada at similar rates as cobertura apenas urbana; only the latter category is what AN-055 tests.

(R2) Coverage IS the lever but the cheap proxy is too coarse. base_lv_size_weighted collapses the candidate's geographic base into one scalar (urban-rural-ish). It cannot distinguish "rural-far-from-center" from "low-income peri-urban" or "ethnically distinct neighborhood". Full Tier 2 (IBGE setor socioeconomic crosstabs) could sharpen, at the cost documented in docs/todo.md (3-5 days sandbox quick-and-dirty, 1.5-2 weeks pipeline-grade).

Both readings agree on the next move: test weighting features structurally before sharpening the geographic proxy. If weighting interactions show signal, the mechanism story tightens without spending the IBGE-setor infrastructure week. If weighting is also null, the case for the IBGE-setor sharpening strengthens.

Refined mechanism inventory (post-AN-055)

Lever	Status	Evidence
Bairro partisan composition	Reversed sign	AN-032
Coverage class (flat)	Underpowered + 0	AN-019
Coverage class × candidate base (cheap Tier 2)	Null (this AN)	AN-055
Coverage deferral	Wrong-signed	AN-024
Audit pct	Heavy overlap, small right-tail gap	AN-021
Methodology completeness	Wrong-signed	AN-022
Interviewer training	Wrong-signed (sponsors describe MORE)	AN-042
Mode (phone / in-person)	Wrong-signed	AN-041
Nonresponse handling	Null-by-data-design	AN-043
Name / scenario rotation	Working hypothesis — sponsored under-document rotation 5×	AN-051
Weighting / income-quota distributions	Not yet tested structurally	— (LLM-judge flagged)

The pattern: nearly every structural lever tested has come back null or wrong-signed against the Channel A "candidates hide methodology" prediction. The two open frontiers are scenario rotation (AN-051's robust positive finding) and weighting/income-quota features (not yet structured).

Follow-ups

Next-up: weighting / income-quota structural extraction (highest paper-value extension). The LLM-judge brief's recurring features include cotas de renda (n=5), ponderação de renda / ponderação por renda (3+2), cotas por nível econômico (n=2), ponderação por nível econômico (n=1). None of the current poll_sampling.py / poll_operations.py schema fields capture quota DISTRIBUTION mismatches between sponsored and indep polls of the same race. Build a structured extractor for the income-quota vector and a "quota deviation from muni baseline" metric, then a per-protocol panel test. Should be ~1 LLM extraction sprint at universe scale (post the queued sampling/operations batch resubmission).
Full Tier 2 only if weighting is also null. With AN-055 null and AN-019/021/022/024 null/wrong, the case for spending the IBGE-setor week strengthens only if weighting also fails. Keep the Tier 2 todo entry as-is.
Re-examine the blinded brief's cobertura geográfica and cobertura urbana e rural themes. These are finer than coverage_class's 6-bucket categorization. The LLM may be picking up substantive coverage differences inside the existing categories (e.g., within urban_plus_selected_rural, which rural districts are included can vary). A follow-up extractor for "list of bairros explicitly excluded" could sharpen.
Sensitivity: base_source quality. Split the sample by base_source ∈ {own_2020_prefeito, party_2020_prefeito, party_2020_vereador} and re-fit Spec 3. If the own-2020-prefeito subset (sharpest base measurement) shows even a noisy positive triple, the proxy-coarseness reading (R2) gains; if it doesn't, the "coverage isn't the lever" reading (R1) gains.

AN-055: Coverage class × candidate-base interaction (cheap Tier 2)

Question

Design

Results

Headline (Spec 1) — sponsor effect survives on the analysis slice

Spec 2 — flat sponsor × narrow coverage interaction is null

Spec 3 — triple interaction (the wash-out-breaking test)

Interpretation

What the triple null does and does not rule out

Two readings survive

Refined mechanism inventory (post-AN-055)

Follow-ups