Spec 2 sponsor-label permutation null: 500 random reassignments of sponsored_by across the FWL-residualized panel produce a null distribution centered on 0.005 pp (sd 0.62, max |β| = 2.07). The observed β = +6.86 pp is unreachable in 500 draws — permutation p < 1/B (= 0.002), about 11 null SDs away from the observed magnitude.
Question
GPT-5-pro's 2026-06-14 pre-submission review asked for a candidate-level sign-flip or sponsor-label permutation for Spec 2. AN-011 has a within-(race × week) permutation for Spec 3c but the analog for Spec 2 was missing. The test asks: under the null that sponsored_by is independent of error conditional on the FE structure, what is the distribution of β?
Design
Two-step FWL residualization (mirrors AN-031's bootstrap scaffold):
- Residualize
erroragainst (opponent_sponsored,log_sample_size,days_to_election,days_to_election_sq) plus candidate FE plus pollster FE →y_tilde. - Residualize
sponsored_byagainst the same →x_tilde. - Observed β = (x_tilde · y_tilde) / (x_tilde · x_tilde).
- For each of B = 500 permutations: random shuffle of
x_tildeacross the 22,665 rows; recompute β_perm. - Two-sided permutation p = share of |β_perm| ≥ |β_obs|.
Results
| Statistic | Value | |---|---| | Observed β (Spec 2, FWL-residualized) | +6.86 pp | | Null distribution mean | +0.005 | | Null distribution sd | 0.62 | | Null distribution 95th percentile of |β| | 1.20 | | Null distribution max |β| (B = 500) | 2.07 | | Two-sided permutation p | < 0.002 (0 of 500) |
The observed β is ~11 null standard deviations above the null mean. None of 500 random permutations reaches even a third of the observed magnitude. The null max |β| (2.07 pp) is itself the extreme tail of B = 500 draws.
Interpretation
- Spec 2 β = +6.86 pp is not an artifact of any particular pairing between sponsored_by and high-leverage rows. Random reassignment of the indicator across the panel cannot reproduce the magnitude.
- Combined with the AN-011 race-week permutation for Spec 3c, both specs survive permutation-inference scrutiny.
- The B = 500 resolution caps the smallest reportable p at 1/500 = 0.002. If a referee insists on tighter resolution, increase B — cost is linear in B and the script is already self-contained.
Caveats
- This is a row-level permutation, not a candidate-level stratified one. GPT's original language allowed either; the row-level version is the more demanding test because it asks whether the within-candidate within-pollster correlation could arise at random across the entire panel.
- A candidate-stratified version (sign-flipping the sponsored_by vector within each candidate's panel) would test a different null and is straightforward to add as a sensitivity. Not done here.
Follow-ups
- Increase B to 5,000 if the field convention requires p < 0.001.
- Candidate-stratified sign-flip as a sensitivity.