Headline survives the five red-team substitutions. K1 (media-only comparator) preserves β at +7.59 with n→253; K3 (Route B vice-prefeito) is falsified upstream (0/429 vice matches); K4 (drop_absorbed) restores β at +8.00 under race-FE-only — within-candidate-FE selection is not generating β; K5 (drop Route D) raises β to +9.30. K2 (raw percent, no renormalization) attenuates β to +5.10 — the within-(protocol × scenario_label) renormalization contributes ~3 pp of the headline magnitude; the residual +5.10 on raw percent is the conservative β to cite alongside the +7.98 renormalized number.

Confidence
green
Type
robustness
Design
Sample
estimulado-non-aggregate-match2 (31,186 rows, 641 sponsored, 8,431 candidates)
Specification
spec 2 (cand FE + pollster FE + log_sample_size + days_to_election + days²) and spec 3c (race × week FE on the clean comparator), cluster-robust SE at muni. Each K-check is a sample or variable substitution on the spec ladder.
Comparator
clean-comparator (sponsored_by==1 OR poll_is_independent==1), redefined per K-check
Cluster
muni
Weights
none
Route filter
per-K1/K5 substitutions
Script
source/analysis/robustness_redteam.py
Target
build/table/robustness_redteam.csv
Status
interpreted · 2026-06-02
Created
2026-06-02

Question

An adverse-referee review of the +7.94 spec-3c sponsor coefficient flagged five "potential killer" substitutions, each of which — according to the referee — could mechanically generate β > 0 without any real sponsor slant. This analysis runs all five on the same panel as a single consolidated robustness battery.

The five checks:

Design

K1/K2/K4/K5 each refit specs 2 and 3c on a modified sample (or variable); K3 is descriptive (no regression). One consolidated long-format table (build/table/robustness_redteam.csv) collects β / SE / p / n across check × spec; K3 and K5 also emit audit-style console output. Specs identical to source/analysis/robustness.py: spec 2 = candidate FE + pollster FE + log_sample_size + days_to_election + days²; spec 3c = race × week FE on the clean comparator, no candidate FE; cluster-robust SE at muni throughout.

Results

Table: Red-team K-checks vs baseline (β, SE, n across spec 2 and spec 3c)

Check Spec 2 β (SE) Spec 3c β (SE) n (3c) Verdict
Baseline +7.98 (1.32)*** +7.94 (2.68)** 448
K1 — media-only comparator (drop pollster_self) +7.98 (1.32)*** +7.59 (4.30) 253 β stable; SE inflates with smaller n
K2 — raw percent (no within-protocol renormalization) +5.10 (0.92)*** +5.30 (2.08)** 448 β attenuates ~30%; renorm contributes ~3 pp
K3 — Route B vice-prefeito audit n/a n/a n/a Falsified upstream: 0/429 Route B matches are VICE-PREFEITO
K4 — race FE + pollster FE (no candidate FE) +8.00 (1.04)*** n/a n/a Within-candidate-FE selection is not driving β
K5 — drop Route D treated rows (party-name regex) +8.82 (1.39)*** +9.30 (3.61)** 315 β rises; Route D names are real party organs

(Cluster-robust SE at muni throughout. *** p<0.001, ** p<0.01.)

(from build/table/robustness_redteam.csv)

Table: Per-check audit diagnostics

(from source/analysis/robustness_redteam.py)

Interpretation

Four of the five attacks are cleared (K1, K3, K4, K5); the fifth (K2) partially attenuates β but does not flip the sign or kill significance.

The paper note should report both numbers — the renormalized β as the primary specification (comparable scale to final_share) and the raw β as the robustness number that excludes any artifact-of-renormalization concern.

Confidence rationale (green). Four of five referee attacks are cleared with the point estimate intact or strengthened; the fifth attenuates β by ~30% but the residual +5.10 remains highly significant and within-candidate-identified. The sign, significance, and order-of-magnitude of the headline survive every substitution.

Follow-ups

  1. K2 mechanism: why does renormalization inflate sponsored polls more? (puzzle). The renormalization denominator depends on the set of non-aggregate candidates listed in the (protocol × scenario_label) row. Sponsored polls plausibly list fewer candidates (a sender-side design lever already in [[design_levers]] §rosters). Test: per protocol, count n_non_aggregate_candidates and the sum of percent over them (the implicit denominator). Compare distributions for sponsored vs independent polls. If sponsored polls have systematically smaller Σ percent, then K2's ~3 pp attenuation is mechanistically explained — and the renormalization-inflated β has Channel-A content (sender chose to list few candidates → mechanical share bump on the listed sponsor) on top of the raw +5 pp. Suggested script: denominator_audit.py.
  2. Drop_absorbed disclosure for the paper (extension). The 97.3% absorbed share is a striking but reportable fact: candidate-FE identification rests on 226 candidates / 1,311 rows of the treated-side comparator structure. The paper note should disclose this explicitly with a footnote citing AN-010 §K4. The race-FE-only β = +8.00 belongs in the robustness table alongside K1/K2/K5.
  3. K1 cell-count fragility (blind spot, low priority). Spec 3c under K1 retains the point estimate but the p-value crosses 0.05 (p = 0.082). The paper note should not lean on spec-3c K1 as strong evidence — the test is underpowered by the time pollster_self polls are dropped from the strict race-week comparator. Spec 3a (race-month FE) under K1 may recover power; worth a single-line addition.
  4. Comparable on independent panel: media-only β (extension). If pollster_self polls themselves carry bias (in either direction), K1's β = +7.59 is partially diagnostic — but the pure media-only independent baseline should also be computed as the mean error for media-only polls (no candidates self-sponsored). Compare to the all-Brazil +0.93 figure for "independent" polls; if the media-only mean error is materially closer to 0 than +0.93, then pollster_self polls are biased and K1 understates the true sponsor effect. Suggested script: a 10-line addition to source/sponsor_baseline.py (if it exists) or a one-off cell.