AN-068: Sponsor-label permutation null for Spec 2

Spec 2 sponsor-label permutation null: 500 random reassignments of sponsored_by across the FWL-residualized panel produce a null distribution centered on 0.005 pp (sd 0.62, max |β| = 2.07). The observed β = +6.86 pp is unreachable in 500 draws — permutation p < 1/B (= 0.002), about 11 null SDs away from the observed magnitude.

Hypothesis: H1: Self-sponsored polls overstate the sponsoring candidate
Confidence: green
Type: robustness

Design

Sample: Spec 2 sample (22,665 rows after the matched_share == 1.0 + estimulado + non-aggregate filters).
Specification: FWL-residualize y (error) and sponsored_by against the other controls + candidate FE + pollster FE. Then row-level random permutation of x_tilde across the panel. B = 500 draws.
Cluster: muni (matches Spec 2 baseline)
Notes: AN-011 already runs an analogous permutation for Spec 3c at the (race × week) cell level. This is the missing analog for Spec 2.

Script: source/analysis/an-068-spec2-permutation.py
Target: build/table/an-068-spec2-permutation.csv
Status: interpreted · 2026-06-14
Created: 2026-06-14

Question

GPT-5-pro's 2026-06-14 pre-submission review asked for a candidate-level sign-flip or sponsor-label permutation for Spec 2. AN-011 has a within-(race × week) permutation for Spec 3c but the analog for Spec 2 was missing. The test asks: under the null that sponsored_by is independent of error conditional on the FE structure, what is the distribution of β?

Design

Two-step FWL residualization (mirrors AN-031's bootstrap scaffold):

Residualize error against (opponent_sponsored, log_sample_size, days_to_election, days_to_election_sq) plus candidate FE plus pollster FE → y_tilde.
Residualize sponsored_by against the same → x_tilde.
Observed β = (x_tilde · y_tilde) / (x_tilde · x_tilde).
For each of B = 500 permutations: random shuffle of x_tilde across the 22,665 rows; recompute β_perm.
Two-sided permutation p = share of |β_perm| ≥ |β_obs|.

Results

| Statistic | Value | |---|---| | Observed β (Spec 2, FWL-residualized) | +6.86 pp | | Null distribution mean | +0.005 | | Null distribution sd | 0.62 | | Null distribution 95th percentile of |β| | 1.20 | | Null distribution max |β| (B = 500) | 2.07 | | Two-sided permutation p | < 0.002 (0 of 500) |

The observed β is ~11 null standard deviations above the null mean. None of 500 random permutations reaches even a third of the observed magnitude. The null max |β| (2.07 pp) is itself the extreme tail of B = 500 draws.

Interpretation

Spec 2 β = +6.86 pp is not an artifact of any particular pairing between sponsored_by and high-leverage rows. Random reassignment of the indicator across the panel cannot reproduce the magnitude.
Combined with the AN-011 race-week permutation for Spec 3c, both specs survive permutation-inference scrutiny.
The B = 500 resolution caps the smallest reportable p at 1/500 = 0.002. If a referee insists on tighter resolution, increase B — cost is linear in B and the script is already self-contained.

Caveats

This is a row-level permutation, not a candidate-level stratified one. GPT's original language allowed either; the row-level version is the more demanding test because it asks whether the within-candidate within-pollster correlation could arise at random across the entire panel.
A candidate-stratified version (sign-flipping the sponsored_by vector within each candidate's panel) would test a different null and is straightforward to add as a sensitivity. Not done here.

Follow-ups

Increase B to 5,000 if the field convention requires p < 0.001.
Candidate-stratified sign-flip as a sensitivity.