Cheap-Tier-2 structural test on coverage × candidate-base finds **no triple-interaction signal**. On the universe-scale within-candidate FE sample (n=20,393 cand-poll rows, 3,524 candidates, 1,665 muni clusters), the headline sponsor effect lands at **+7.70 pp (SE 1.44, p<10⁻⁷)**, but the flat `sponsored × narrow_coverage` interaction is null (β = +1.61, SE 4.93, p=0.74) and the wash-out-breaking triple `sponsored × narrow_coverage × base_lv_size_weighted` is also null (β = −0.54, SE 1.35, p=0.69). The 95 % CI on the triple is [−3.2, +2.1] in units of pp per unit of base_lv_dm — rules out large triples but not modest ones. Read: coverage class is unlikely to be the dominant Channel-A lever for the headline +7 pp; consistent with AN-032's reversed-sign bairro test. Directs attention to weighting / income-quota features (next AN).
Question
The blinded LLM-judge pilot
(blinded Channel-A discovery brief)
flagged coverage and weighting as the dominant high-plausibility
mechanism domains, with 14/16 high-plausibility hypotheses
agreeing with the actual sponsored side (87.5 %, p ≈ 0.004).
The flat structural test sponsored × coverage_class (AN-019)
was underpowered and direction-ambiguous: rural-base candidates'
sponsors might choose rural-friendly coverage and urban-base
candidates' sponsors might choose urban-only coverage, so a
directional coverage effect can wash out in the aggregate.
This analysis breaks the wash-out by interacting the coverage
choice with each candidate's natural base concentration. The cheap
proxy is base_lv_size_weighted = vote-weighted average of "seções
per local_votacao" across the candidate's 2020 base:
- Higher value → urban-leaning base (urban LVs serve many seções)
- Lower value → rural-leaning base (rural LVs serve few seções)
If sponsored polls choose narrow (urban-only / specific-neighborhood)
coverage when their candidate's base concentrates in dense urban LVs,
the triple interaction
sponsored × narrow_coverage × base_lv_size_weighted should be positive.
Design
source/analysis/an-055-coverage-by-cand-base.py:
- Load
build/assemble/cand_poll.parquet(with the base profile columns piped through frombuild/assemble/cand.parquet). - Load
coverage_classper protocol from the cachedpoll_coverageLLM extractions (14,876 protocols at universe scale — both LLM-extracted and deterministic short-circuits). - Define
narrow_coverage = 1[coverage_class ∈ {urban_only, specific_neighborhoods}]. - Restrict to candidates with a non-unavailable
base_sourceand to candidates appearing in ≥ 2 polls. Demeanbase_lv_size_weightedfor interaction interpretability. - Fit three nested PanelOLS specs with within-candidate fixed effects and muni-clustered SEs.
The base-profile build is documented in
source/intermediate/cand__base_profile.py and follows the
fallback ladder: own 2020 prefeito vote → party 2020 prefeito vote in
same muni → party 2020 vereador vote → unavailable.
Results
Headline (Spec 1) — sponsor effect survives on the analysis slice
| Statistic | Value |
|---|---|
| Sample (cand-poll rows) | 20,393 |
| Candidates | 3,524 |
| Muni clusters | 1,665 |
| β_sponsored | +7.70 pp |
| SE (muni-clustered) | 1.44 |
| t | 5.35 |
| p | 9.0 × 10⁻⁸ |
The +7-8 pp sponsor effect from the headline analysis lands here at +7.70 pp on a strict slice (within-cand FE + base profile available + coverage extraction available). No drift.
Spec 2 — flat sponsor × narrow coverage interaction is null
| Coefficient | β | SE | t | p |
|---|---|---|---|---|
| sponsored | +7.56 | 1.44 | 5.25 | 1.5 × 10⁻⁷ |
| narrow_coverage | −0.86 | 0.58 | −1.47 | 0.140 |
| sponsored × narrow_coverage | +1.61 | 4.93 | 0.33 | 0.743 |
The differential sponsor effect inside narrow-coverage polls is not statistically distinguishable from zero. This is the wash-out target. The interpretation is direction-ambiguous on its own: it can mean (a) no coverage channel exists, or (b) the coverage channel exists but directional alignment cancels in the aggregate.
Spec 3 — triple interaction (the wash-out-breaking test)
| Coefficient | β | SE | t | p |
|---|---|---|---|---|
| sponsored | +6.29 | 1.69 | 3.72 | 2.0 × 10⁻⁴ |
| narrow_coverage | −0.91 | 0.58 | −1.57 | 0.117 |
| sponsored × narrow_coverage | −7.98 | 22.44 | −0.36 | 0.722 |
| sponsored × base_lv_dm | −0.097 | 0.11 | −0.92 | 0.357 |
| narrow_coverage × base_lv_dm | −0.0073 | 0.0080 | −0.92 | 0.359 |
| sponsored × narrow × base_lv_dm | −0.54 | 1.35 | −0.40 | 0.689 |
The triple is also null. Point estimate is small and slightly negative; 95 % CI on the triple is approximately [−3.2, +2.1] pp per unit of base_lv_dm.
The candidate-level main effect of base_lv_dm is absorbed by the within-cand FE (it's constant per politico_id by construction). All three interactions are identified from within-candidate variation in sponsored and narrow_coverage.
Interpretation
What the triple null does and does not rule out
The 95 % CI on the triple (±1.4 pp per unit of base_lv_dm) — over the inter-quartile range of base_lv_dm (roughly 4 to 30 in the analysis sample, so a swing of about 25 units) — gives a confidence band on the implied within-pair coverage-bias effect of approximately ±35 pp. This is wide. We can rule out very large coverage × base alignments, but a modest one (say, a 5-pp differential between urban-base and rural-base candidates in narrow-coverage polls) sits comfortably inside the CI. This is not a precise null in the strict sense.
What it does say: at this proxy granularity (base_lv_size_weighted from 2020 seção votes), there is no evidence the structural Channel A mechanism runs through coverage × candidate-base alignment at universe scale.
Two readings survive
(R1) Coverage isn't the dominant Channel A lever. Consistent with AN-032 (bairro partisan composition reversed sign), AN-019 (small noisy positive), AN-024 (deferral wrong-signed). The +7.7 pp sponsor effect channels through some other mechanism — most likely weighting, income-quota distributions, or scenario rotation (AN-051's already-flagged 26 × under-documentation of name rotation in sponsored polls). The LLM-judge brief flagged cotas de renda / ponderação por renda / cobertura geográfica detalhada at similar rates as cobertura apenas urbana; only the latter category is what AN-055 tests.
(R2) Coverage IS the lever but the cheap proxy is too coarse. base_lv_size_weighted collapses the candidate's geographic base into one scalar (urban-rural-ish). It cannot distinguish "rural-far-from-center" from "low-income peri-urban" or "ethnically distinct neighborhood". Full Tier 2 (IBGE setor socioeconomic crosstabs) could sharpen, at the cost documented in docs/todo.md (3-5 days sandbox quick-and-dirty, 1.5-2 weeks pipeline-grade).
Both readings agree on the next move: test weighting features structurally before sharpening the geographic proxy. If weighting interactions show signal, the mechanism story tightens without spending the IBGE-setor infrastructure week. If weighting is also null, the case for the IBGE-setor sharpening strengthens.
Refined mechanism inventory (post-AN-055)
| Lever | Status | Evidence |
|---|---|---|
| Bairro partisan composition | Reversed sign | AN-032 |
| Coverage class (flat) | Underpowered + 0 | AN-019 |
| Coverage class × candidate base (cheap Tier 2) | Null (this AN) | AN-055 |
| Coverage deferral | Wrong-signed | AN-024 |
| Audit pct | Heavy overlap, small right-tail gap | AN-021 |
| Methodology completeness | Wrong-signed | AN-022 |
| Interviewer training | Wrong-signed (sponsors describe MORE) | AN-042 |
| Mode (phone / in-person) | Wrong-signed | AN-041 |
| Nonresponse handling | Null-by-data-design | AN-043 |
| Name / scenario rotation | Working hypothesis — sponsored under-document rotation 5× | AN-051 |
| Weighting / income-quota distributions | Not yet tested structurally | — (LLM-judge flagged) |
The pattern: nearly every structural lever tested has come back null or wrong-signed against the Channel A "candidates hide methodology" prediction. The two open frontiers are scenario rotation (AN-051's robust positive finding) and weighting/income-quota features (not yet structured).
Follow-ups
- Next-up: weighting / income-quota structural extraction
(highest paper-value extension). The LLM-judge brief's recurring features include
cotas de renda(n=5),ponderação de renda/ponderação por renda(3+2),cotas por nível econômico(n=2),ponderação por nível econômico(n=1). None of the currentpoll_sampling.py/poll_operations.pyschema fields capture quota DISTRIBUTION mismatches between sponsored and indep polls of the same race. Build a structured extractor for the income-quota vector and a "quota deviation from muni baseline" metric, then a per-protocol panel test. Should be ~1 LLM extraction sprint at universe scale (post the queued sampling/operations batch resubmission). - Full Tier 2 only if weighting is also null. With AN-055 null and AN-019/021/022/024 null/wrong, the case for spending the IBGE-setor week strengthens only if weighting also fails. Keep the Tier 2 todo entry as-is.
- Re-examine the blinded brief's
cobertura geográficaandcobertura urbana e ruralthemes. These are finer thancoverage_class's 6-bucket categorization. The LLM may be picking up substantive coverage differences inside the existing categories (e.g., withinurban_plus_selected_rural, which rural districts are included can vary). A follow-up extractor for "list of bairros explicitly excluded" could sharpen. - Sensitivity: base_source quality. Split the sample by base_source ∈ {own_2020_prefeito, party_2020_prefeito, party_2020_vereador} and re-fit Spec 3. If the own-2020-prefeito subset (sharpest base measurement) shows even a noisy positive triple, the proxy-coarseness reading (R2) gains; if it doesn't, the "coverage isn't the lever" reading (R1) gains.