AN-010: Does the headline survive the red-team's five highest-leverage attacks?

Headline survives the five red-team substitutions. K1 (media-only comparator) preserves β at +7.59 with n→253; K3 (Route B vice-prefeito) is falsified upstream (0/429 vice matches); K4 (drop_absorbed) restores β at +8.00 under race-FE-only — within-candidate-FE selection is not generating β; K5 (drop Route D) raises β to +9.30. K2 (raw percent, no renormalization) attenuates β to +5.10 — the within-(protocol × scenario_label) renormalization contributes ~3 pp of the headline magnitude; the residual +5.10 on raw percent is the conservative β to cite alongside the +7.98 renormalized number.

Hypothesis: H1: Self-sponsored polls overstate the sponsoring candidate
Confidence: green
Type: robustness

Design

Sample: estimulado-non-aggregate-match2 (31,186 rows, 641 sponsored, 8,431 candidates)
Specification: spec 2 (cand FE + pollster FE + log_sample_size + days_to_election + days²) and spec 3c (race × week FE on the clean comparator), cluster-robust SE at muni. Each K-check is a sample or variable substitution on the spec ladder.
Comparator: clean-comparator (sponsored_by==1 OR poll_is_independent==1), redefined per K-check
Cluster: muni
Weights: none
Route filter: per-K1/K5 substitutions

Script: source/analysis/robustness_redteam.py
Target: build/table/robustness_redteam.csv
Status: interpreted · 2026-06-02
Created: 2026-06-02

Question

An adverse-referee review of the +7.94 spec-3c sponsor coefficient flagged five "potential killer" substitutions, each of which — according to the referee — could mechanically generate β > 0 without any real sponsor slant. This analysis runs all five on the same panel as a single consolidated robustness battery.

The five checks:

K1 — Media-only comparator. Drop the 7,961 pollster_self polls (26% of all polls) on suspicion they under-report frontrunners, mechanically inflating β.
K2 — Raw percent (no renormalization). Refit with poll_percent_raw — sponsored polls list fewer candidates, so the within-(protocol × scenario_label) denominator is smaller and every share inflates mechanically.
K3 — Route B vice-prefeito audit. Route B is 429/641 = 67% of treated rows. Descriptive count of committee_office == VICE-PREFEITO matches to test whether vice-committees are tagged as sponsoring the prefeito on the same ticket.
K4 — drop_absorbed audit. Count the candidates with no within-variation in sponsored_by; refit with race FE replacing candidate FE to estimate on the full panel.
K5 — Route D party-regex audit. Inspect the 152 party-name route sponsors by hand; refit dropping Route D entirely.

Design

K1/K2/K4/K5 each refit specs 2 and 3c on a modified sample (or variable); K3 is descriptive (no regression). One consolidated long-format table (build/table/robustness_redteam.csv) collects β / SE / p / n across check × spec; K3 and K5 also emit audit-style console output. Specs identical to source/analysis/robustness.py: spec 2 = candidate FE + pollster FE + log_sample_size + days_to_election + days²; spec 3c = race × week FE on the clean comparator, no candidate FE; cluster-robust SE at muni throughout.

Results

Table: Red-team K-checks vs baseline (β, SE, n across spec 2 and spec 3c)

Check	Spec 2 β (SE)	Spec 3c β (SE)	n (3c)	Verdict
Baseline	+7.98 (1.32)***	+7.94 (2.68)**	448	—
K1 — media-only comparator (drop pollster_self)	+7.98 (1.32)***	+7.59 (4.30)	253	β stable; SE inflates with smaller n
K2 — raw `percent` (no within-protocol renormalization)	+5.10 (0.92)***	+5.30 (2.08)**	448	β attenuates ~30%; renorm contributes ~3 pp
K3 — Route B vice-prefeito audit	n/a	n/a	n/a	Falsified upstream: 0/429 Route B matches are VICE-PREFEITO
K4 — race FE + pollster FE (no candidate FE)	+8.00 (1.04)***	n/a	n/a	Within-candidate-FE selection is not driving β
K5 — drop Route D treated rows (party-name regex)	+8.82 (1.39)***	+9.30 (3.61)**	315	β rises; Route D names are real party organs

(Cluster-robust SE at muni throughout. *** p<0.001, ** p<0.01.)

(from build/table/robustness_redteam.csv)

Table: Per-check audit diagnostics

K1. Dropping the 7,961 pollster_self polls leaves spec 2 unchanged at +7.98 (spec 2 uses all rows) and moves spec 3c from +7.94 to +7.59; SE inflates from 2.68 to 4.30 as the (race × week) cells with both sponsored and media-only independent polls drop from 448 to 253 rows.
K2. Recomputing error = poll_percent_raw − 100·final_share gives β = +5.10 (spec 2) and +5.30 (spec 3c) — a ~30% attenuation; renormalization adds ~3 pp because sponsored polls list fewer non-aggregate candidates so the denominator is smaller.
K3. Zero of the 429 Route B treated rows are tagged committee_office == VICE-PREFEITO; the upstream join in pipelines/politica/source/clean/poll_sponsor.py already restricts Route B to PREFEITO committees. Final-rank distribution of the 429 matches: 215 rank 1; 126 rank 2; 41 rank 3; 47 rank ≥4 — the expected "frontrunners commission their own polls" pattern.
K4. 8,205 of 8,431 candidates (97.3%) have no within-variation in sponsored_by — only 226 candidates contribute on 1,311 of 31,186 rows (4.2%). Refitting with race FE replacing candidate FE (full panel) gives β = +8.00 (SE 1.04), within 0.02 pp of the candidate-FE estimate.
K5. The 152 Route D treated rows trace to 41 unique sponsor strings, all unambiguously party organs (PSD, PL, PSB, PMN, PP, PT, Republicanos, PMDB, etc.) — no "PLANEJAMENTO"/"PROPAGANDA" false positives among the top-20. Dropping Route D (n=634→482 in spec 3c; treated drops 23.7%) raises β to +8.82 (spec 2) and +9.30 (spec 3c).

(from source/analysis/robustness_redteam.py)

Interpretation

Four of the five attacks are cleared (K1, K3, K4, K5); the fifth (K2) partially attenuates β but does not flip the sign or kill significance.

Within-candidate FE is not selecting on a small subset. K4 race-FE-only β = +8.00 on the full sample lands within 0.02 pp of the candidate-FE estimate — the 97.3% absorbed share is not driving the result.
Route audits are clean. Route B committee CNPJ matches (K3) are all PREFEITO by upstream construction; Route D party-name regex matches (K5) are all real party organs and dropping them strengthens β.
Pollster-self comparator contamination is not the source. K1 drops 7,961 pollster_self polls and preserves the point estimate at +7.59, with SE inflation reflecting the smaller race-week cell count.
Renormalization is doing real work. ~3 pp of the headline magnitude comes from the within-(protocol × scenario_label) step. The conservative raw-percent specification gives β = +5.10 to +5.30 — still within-candidate-significant, but materially smaller.

The paper note should report both numbers — the renormalized β as the primary specification (comparable scale to final_share) and the raw β as the robustness number that excludes any artifact-of-renormalization concern.

Confidence rationale (green). Four of five referee attacks are cleared with the point estimate intact or strengthened; the fifth attenuates β by ~30% but the residual +5.10 remains highly significant and within-candidate-identified. The sign, significance, and order-of-magnitude of the headline survive every substitution.

Follow-ups

K2 mechanism: why does renormalization inflate sponsored polls more? (puzzle). The renormalization denominator depends on the set of non-aggregate candidates listed in the (protocol × scenario_label) row. Sponsored polls plausibly list fewer candidates (a sender-side design lever already in [[design_levers]] §rosters). Test: per protocol, count n_non_aggregate_candidates and the sum of percent over them (the implicit denominator). Compare distributions for sponsored vs independent polls. If sponsored polls have systematically smaller Σ percent, then K2's ~3 pp attenuation is mechanistically explained — and the renormalization-inflated β has Channel-A content (sender chose to list few candidates → mechanical share bump on the listed sponsor) on top of the raw +5 pp. Suggested script: denominator_audit.py.
Drop_absorbed disclosure for the paper (extension). The 97.3% absorbed share is a striking but reportable fact: candidate-FE identification rests on 226 candidates / 1,311 rows of the treated-side comparator structure. The paper note should disclose this explicitly with a footnote citing AN-010 §K4. The race-FE-only β = +8.00 belongs in the robustness table alongside K1/K2/K5.
K1 cell-count fragility (blind spot, low priority). Spec 3c under K1 retains the point estimate but the p-value crosses 0.05 (p = 0.082). The paper note should not lean on spec-3c K1 as strong evidence — the test is underpowered by the time pollster_self polls are dropped from the strict race-week comparator. Spec 3a (race-month FE) under K1 may recover power; worth a single-line addition.
Comparable on independent panel: media-only β (extension). If pollster_self polls themselves carry bias (in either direction), K1's β = +7.59 is partially diagnostic — but the pure media-only independent baseline should also be computed as the mean error for media-only polls (no candidates self-sponsored). Compare to the all-Brazil +0.93 figure for "independent" polls; if the media-only mean error is materially closer to 0 than +0.93, then pollster_self polls are biased and K1 understates the true sponsor effect. Suggested script: a 10-line addition to source/sponsor_baseline.py (if it exists) or a one-off cell.