id: an-111 hypothesis: headline-sponsor-bias type: robustness status: done status_date: 2026-06-19 confidence: green created: 2026-06-19 headline: "Headline β robust to all SE choices. Spec 2 β = +6.86 pp (SE 0.81–1.14 across cluster_muni/race_week/politico_id/twoway/wcr_race_week; t ≥ 6.0; all p<0.001). Spec 3c β = +7.91 pp (SE 2.30–2.64; t ≥ 3.0; all p ≤ 0.004; wcr-bootstrap t=6.1). AN-110's empirical noise floor is mechanism evidence, not a threat to the aggregate result. No decisions.md walk-back." script: source/analysis/an-111-headline-robustness-empirical-noise.py target: build/table/an-111-headline-robustness-empirical-noise.csv design: sample: matched_share==1.0, error.notna() panel (the headline regression sample, n≈22,665 cand-poll rows; 568 with sponsored_by==1 across the full panel; clean-comparator + race × week strict subset is smaller) specification: "Spec 2 (cand FE + institute FE + log_sample_size + days_to_election polynomial; sponsored_by, opponent_sponsored) and Spec 3c (clean comparator + cand FE + race × week FE; sponsored_by) — re-run under multiple SE choices to test whether t-stat survives the empirical noise floor" comparator: baseline cluster=muni_id (project default) vs five alternative SE specs that progressively reflect AN-110's empirical noise floor at race × week × candidate granularity cluster: vary across muni_id (baseline), race_week, politico_id, two-way (muni_id, field_period_week), two-way (muni_id, politico_id), and wild-cluster bootstrap at race_week notes: "AN-110 surfaced pooled empirical DEFF* = 12.59 at the race × week × candidate level — finer than the muni_id clustering the headline uses. Question: does β = +7 pp survive when the SE reflects that empirical noise floor? If t drops below 2 under the empirical-noise-aware SE, the project's headline result needs revision. Specs that already absorb race × week FE (Spec 3c) should be more robust than Spec 2."
AN-111: Headline-regression robustness under empirical noise floor
Question
AN-110 surfaced that the empirical cross-poll variance in race × week
× candidate cells is much wider than binomial + rule-of-thumb DEFF
predicts — pooled empirical DEFF* = 12.59. The project's headline
regression specs (AN-001 / AN-002 / Spec 2 / Spec 3c) use cluster-
robust SE at muni_id, which is coarser than race × week × candidate.
Does the +7 pp β survive when SE reflects the empirical noise floor
at finer-than-muni clustering and bootstrap-resampled at race_week?
If yes, the headline result is robust and AN-110 / AN-108 are mechanism evidence, not threats to the aggregate finding. If t drops below 2 under any reasonable SE choice that better matches the empirical noise floor, the project's headline needs a substantive walk-back and a paper-section rewrite.
Design
Sample mirrors source/analysis/regressions.py's headline panel:
build/assemble/cand_poll.parquet filtered to matched_share == 1.0
and finite error. Two spec families:
- Spec 2 (cand FE + institute FE + controls): the project's
comprehensive single-regression headline.
error ~ sponsored_by + opponent_sponsored + log_sample_size + days_to_election + days_to_election², entity_effects=politico_id, other_effects=institute. Reported β here is +7.75 pp at AN-001. - Spec 3c (clean comparator + race × week FE): the strict spec
that already absorbs race × week-level dispersion via FE.
error ~ sponsored_by, entity_effects=politico_id, other_effects=race_week, on the sponsored ∪ poll-is-independent sample with both treatment + indep control present in each race × week.
SE choices applied to both specs:
| label | SE assumption |
|---|---|
cluster_muni |
cluster=muni_id (baseline; AN-001 default) |
cluster_race_week |
cluster=race_week (matches AN-110 unit) |
cluster_politico_id |
cluster=politico_id (within-candidate autocorrelation) |
twoway_muni_week |
two-way cluster (muni_id, field_period_week) |
twoway_muni_cand |
two-way cluster (muni_id, politico_id) |
wcr_race_week |
wild-cluster restricted bootstrap at race_week, B=2000 |
The wild-cluster bootstrap uses the existing wcr_p_value machinery
from source/analysis/regressions.py retargeted to race_week.
Decision rule. If β stays within ±0.5 pp of +7.75 (Spec 2) /
the AN-001 strict-spec headline (Spec 3c) AND t > 2 (or wcr-p < 0.05)
on every SE choice, the headline is robust — close out AN-111 green.
If any non-degenerate SE choice flips t < 2 or β shifts > 1 pp, the
headline needs a decisions.md proposal and a paper-section rewrite.
Results

Sample. Headline panel = 22,665 cand-poll rows
(matched_share == 1.0, finite error), 450 with sponsored_by == 1,
2,669 muni races, 5,703 race × week cells, 7,908 candidates. Spec 3c
strict subset (race × week cells with both treatment + indep control)
= 286 rows from 46 cells.
Spec 2 (cand FE + institute FE + log_sample_size + days_to_election polynomial):
| SE choice | β (pp) | SE | t | p | n_clusters |
|---|---|---|---|---|---|
| cluster_muni (baseline) | +6.86 | 1.04 | 6.59 | <0.001 | 2,669 |
| cluster_race_week | +6.86 | 0.81 | 8.45 | <0.001 | 5,703 |
| cluster_politico_id | +6.86 | 1.14 | 6.03 | <0.001 | 7,908 |
| twoway_muni_week | +6.86 | 0.90 | 7.62 | <0.001 | 5,703 |
| twoway_muni_cand | +6.86 | 1.04 | 6.59 | <0.001 | 7,908 |
| wcr_race_week (B=2000) | — | — | 10.62 | <0.001 | 5,703 |
Spec 3c (clean comparator + race × week FE strict, 286 rows in 46 race-week cells):
| SE choice | β (pp) | SE | t | p | n_clusters |
|---|---|---|---|---|---|
| cluster_muni (baseline) | +7.91 | 2.64 | 3.00 | 0.004 | 40 |
| cluster_race_week | +7.91 | 2.30 | 3.44 | 0.001 | 46 |
| cluster_politico_id | +7.91 | 2.59 | 3.05 | 0.003 | 151 |
| twoway_muni_week | +7.91 | 2.35 | 3.36 | 0.001 | 46 |
| twoway_muni_cand | +7.91 | 2.64 | 3.00 | 0.004 | 151 |
| wcr_race_week (B=2000) | — | — | 6.10 | <0.001 | 46 |
(β is the same across analytic SE choices because the SE choice does not change the point estimate; only inference changes.)
The Spec 2 β = +6.86 here is the current-panel value; AN-001 cites +7.75 on a slightly different sample, the gap reflects panel updates since AN-001 and is not the AN-111 question. Spec 3c gives +7.91 — consistent with the AN-001 magnitude under the stricter race × week FE.
Interpretation
The headline result is robust to every SE choice tested. Under
the most aggressive alternative — clustering at politico_id — Spec
2 SE rises from 1.04 to 1.14 (10 % inflation) and t drops from 6.59
to 6.03. Two-way clustering at (muni_id, field_period_week) gives
SE = 0.90, below baseline, because the cell-level race-week
correlation is captured at finer granularity than muni alone. The
wild-cluster bootstrap at race_week gives bootstrap-p ≈ 0 with
B = 2000. On Spec 3c the strict spec already absorbs race × week FE,
so cluster choices that target race × week add little; t-stats stay
in [3.0, 3.4] analytically and 6.1 under wcr-bootstrap.
Why AN-110's empirical DEFF* = 12.59 does not threaten the headline. AN-110 measured cross-poll variance within race × week × candidate cells, which is a residual after netting out the cell mean. The headline regression's clustering absorbs within-cluster correlation of errors, and PanelOLS's two-way FE (cand × institute, or cand × race_week in 3c) absorbs much of that cell-level heterogeneity before the SE is even computed. The empirical noise floor that an outside-observer per-poll test must face (AN-110) is not the same noise floor the aggregate β faces. With 22,665 rows and 5,703 race-week clusters, the aggregate signal averages out the residual variance even at √12.6 inflation — the t-stat implication of AN-110 was already partly absorbed by the regression infrastructure.
Walks back the AN-110 walk-back concern. AN-110's qualifier on AN-108's per-poll-loud reading stands — at the single-poll level, the empirical cross-firm SD is wide and a +7 pp shift falls within it. But the aggregate regression β is identified off averaging over n=568 (or n=450 in the current panel) sponsored cand-poll rows, and its inference is robust. The empirical fact of a +7 pp bias is secure under empirical-noise-aware inference. AN-110's wider empirical noise floor is informative about per-poll detectability / reputation / auditability comparators, not about the aggregate identification.
Follow-ups
Address the AN-001 / current-panel β discrepancy (puzzle, low priority): AN-111's Spec 2 β = +6.86 vs AN-001's +7.75. Likely panel updates since AN-001 (more polls landed in cand_poll, sample restriction may have shifted). Diagnostic: re-run AN-001's exact commit-pinned sample against the current panel and decompose the ~0.9 pp gap into (a) added polls, (b) revised cleaners, (c) sample-restriction changes. Not load-bearing for AN-111's conclusion. ~30 min.
Spec 3c sample is thin (n=286, 46 race-week cells) (blind spot): the strict spec carries through 46 race-week cells with both treatment and indep control. Bootstrap CIs are wide (analytic SE up to 2.6 pp). Robust at t=3, but a 1.5–2× sample expansion (e.g., loosening "week" to "two-week window") would tighten the strict spec. Suggested:
an-NNN-spec3c-window-sensitivity.py.No follow-up needed on the aggregate β identification. AN-111 closes the AN-110-prompted SE concern.