id: an-111 hypothesis: headline-sponsor-bias type: robustness status: done status_date: 2026-06-19 confidence: green created: 2026-06-19 headline: "Headline β robust to all SE choices. Spec 2 β = +6.86 pp (SE 0.81–1.14 across cluster_muni/race_week/politico_id/twoway/wcr_race_week; t ≥ 6.0; all p<0.001). Spec 3c β = +7.91 pp (SE 2.30–2.64; t ≥ 3.0; all p ≤ 0.004; wcr-bootstrap t=6.1). AN-110's empirical noise floor is mechanism evidence, not a threat to the aggregate result. No decisions.md walk-back." script: source/analysis/an-111-headline-robustness-empirical-noise.py target: build/table/an-111-headline-robustness-empirical-noise.csv design: sample: matched_share==1.0, error.notna() panel (the headline regression sample, n≈22,665 cand-poll rows; 568 with sponsored_by==1 across the full panel; clean-comparator + race × week strict subset is smaller) specification: "Spec 2 (cand FE + institute FE + log_sample_size + days_to_election polynomial; sponsored_by, opponent_sponsored) and Spec 3c (clean comparator + cand FE + race × week FE; sponsored_by) — re-run under multiple SE choices to test whether t-stat survives the empirical noise floor" comparator: baseline cluster=muni_id (project default) vs five alternative SE specs that progressively reflect AN-110's empirical noise floor at race × week × candidate granularity cluster: vary across muni_id (baseline), race_week, politico_id, two-way (muni_id, field_period_week), two-way (muni_id, politico_id), and wild-cluster bootstrap at race_week notes: "AN-110 surfaced pooled empirical DEFF* = 12.59 at the race × week × candidate level — finer than the muni_id clustering the headline uses. Question: does β = +7 pp survive when the SE reflects that empirical noise floor? If t drops below 2 under the empirical-noise-aware SE, the project's headline result needs revision. Specs that already absorb race × week FE (Spec 3c) should be more robust than Spec 2."

AN-111: Headline-regression robustness under empirical noise floor

Question

AN-110 surfaced that the empirical cross-poll variance in race × week × candidate cells is much wider than binomial + rule-of-thumb DEFF predicts — pooled empirical DEFF* = 12.59. The project's headline regression specs (AN-001 / AN-002 / Spec 2 / Spec 3c) use cluster- robust SE at muni_id, which is coarser than race × week × candidate. Does the +7 pp β survive when SE reflects the empirical noise floor at finer-than-muni clustering and bootstrap-resampled at race_week?

If yes, the headline result is robust and AN-110 / AN-108 are mechanism evidence, not threats to the aggregate finding. If t drops below 2 under any reasonable SE choice that better matches the empirical noise floor, the project's headline needs a substantive walk-back and a paper-section rewrite.

Design

Sample mirrors source/analysis/regressions.py's headline panel: build/assemble/cand_poll.parquet filtered to matched_share == 1.0 and finite error. Two spec families:

Spec 2 (cand FE + institute FE + controls): the project's comprehensive single-regression headline. error ~ sponsored_by + opponent_sponsored + log_sample_size + days_to_election + days_to_election², entity_effects=politico_id, other_effects=institute. Reported β here is +7.75 pp at AN-001.
Spec 3c (clean comparator + race × week FE): the strict spec that already absorbs race × week-level dispersion via FE. error ~ sponsored_by, entity_effects=politico_id, other_effects=race_week, on the sponsored ∪ poll-is-independent sample with both treatment + indep control present in each race × week.

SE choices applied to both specs:

label	SE assumption
`cluster_muni`	cluster=muni_id (baseline; AN-001 default)
`cluster_race_week`	cluster=race_week (matches AN-110 unit)
`cluster_politico_id`	cluster=politico_id (within-candidate autocorrelation)
`twoway_muni_week`	two-way cluster (muni_id, field_period_week)
`twoway_muni_cand`	two-way cluster (muni_id, politico_id)
`wcr_race_week`	wild-cluster restricted bootstrap at race_week, B=2000

The wild-cluster bootstrap uses the existing wcr_p_value machinery from source/analysis/regressions.py retargeted to race_week.

Decision rule. If β stays within ±0.5 pp of +7.75 (Spec 2) / the AN-001 strict-spec headline (Spec 3c) AND t > 2 (or wcr-p < 0.05) on every SE choice, the headline is robust — close out AN-111 green. If any non-degenerate SE choice flips t < 2 or β shifts > 1 pp, the headline needs a decisions.md proposal and a paper-section rewrite.

Results

AN-111 forest plot — β on sponsored_by across SE choices for Spec
2 (left) and Spec 3c (right), with 95 % CI bars at β ± 1.96·SE and
t-stats annotated.

Sample. Headline panel = 22,665 cand-poll rows (matched_share == 1.0, finite error), 450 with sponsored_by == 1, 2,669 muni races, 5,703 race × week cells, 7,908 candidates. Spec 3c strict subset (race × week cells with both treatment + indep control) = 286 rows from 46 cells.

Spec 2 (cand FE + institute FE + log_sample_size + days_to_election polynomial):

SE choice	β (pp)	SE	t	p	n_clusters
cluster_muni (baseline)	+6.86	1.04	6.59	<0.001	2,669
cluster_race_week	+6.86	0.81	8.45	<0.001	5,703
cluster_politico_id	+6.86	1.14	6.03	<0.001	7,908
twoway_muni_week	+6.86	0.90	7.62	<0.001	5,703
twoway_muni_cand	+6.86	1.04	6.59	<0.001	7,908
wcr_race_week (B=2000)	—	—	10.62	<0.001	5,703

Spec 3c (clean comparator + race × week FE strict, 286 rows in 46 race-week cells):

SE choice	β (pp)	SE	t	p	n_clusters
cluster_muni (baseline)	+7.91	2.64	3.00	0.004	40
cluster_race_week	+7.91	2.30	3.44	0.001	46
cluster_politico_id	+7.91	2.59	3.05	0.003	151
twoway_muni_week	+7.91	2.35	3.36	0.001	46
twoway_muni_cand	+7.91	2.64	3.00	0.004	151
wcr_race_week (B=2000)	—	—	6.10	<0.001	46

(β is the same across analytic SE choices because the SE choice does not change the point estimate; only inference changes.)

The Spec 2 β = +6.86 here is the current-panel value; AN-001 cites +7.75 on a slightly different sample, the gap reflects panel updates since AN-001 and is not the AN-111 question. Spec 3c gives +7.91 — consistent with the AN-001 magnitude under the stricter race × week FE.

Interpretation

The headline result is robust to every SE choice tested. Under the most aggressive alternative — clustering at politico_id — Spec 2 SE rises from 1.04 to 1.14 (10 % inflation) and t drops from 6.59 to 6.03. Two-way clustering at (muni_id, field_period_week) gives SE = 0.90, below baseline, because the cell-level race-week correlation is captured at finer granularity than muni alone. The wild-cluster bootstrap at race_week gives bootstrap-p ≈ 0 with B = 2000. On Spec 3c the strict spec already absorbs race × week FE, so cluster choices that target race × week add little; t-stats stay in [3.0, 3.4] analytically and 6.1 under wcr-bootstrap.

Why AN-110's empirical DEFF* = 12.59 does not threaten the headline. AN-110 measured cross-poll variance within race × week × candidate cells, which is a residual after netting out the cell mean. The headline regression's clustering absorbs within-cluster correlation of errors, and PanelOLS's two-way FE (cand × institute, or cand × race_week in 3c) absorbs much of that cell-level heterogeneity before the SE is even computed. The empirical noise floor that an outside-observer per-poll test must face (AN-110) is not the same noise floor the aggregate β faces. With 22,665 rows and 5,703 race-week clusters, the aggregate signal averages out the residual variance even at √12.6 inflation — the t-stat implication of AN-110 was already partly absorbed by the regression infrastructure.

Walks back the AN-110 walk-back concern. AN-110's qualifier on AN-108's per-poll-loud reading stands — at the single-poll level, the empirical cross-firm SD is wide and a +7 pp shift falls within it. But the aggregate regression β is identified off averaging over n=568 (or n=450 in the current panel) sponsored cand-poll rows, and its inference is robust. The empirical fact of a +7 pp bias is secure under empirical-noise-aware inference. AN-110's wider empirical noise floor is informative about per-poll detectability / reputation / auditability comparators, not about the aggregate identification.

Follow-ups

Address the AN-001 / current-panel β discrepancy (puzzle, low priority): AN-111's Spec 2 β = +6.86 vs AN-001's +7.75. Likely panel updates since AN-001 (more polls landed in cand_poll, sample restriction may have shifted). Diagnostic: re-run AN-001's exact commit-pinned sample against the current panel and decompose the ~0.9 pp gap into (a) added polls, (b) revised cleaners, (c) sample-restriction changes. Not load-bearing for AN-111's conclusion. ~30 min.
Spec 3c sample is thin (n=286, 46 race-week cells) (blind spot): the strict spec carries through 46 race-week cells with both treatment and indep control. Bootstrap CIs are wide (analytic SE up to 2.6 pp). Robust at t=3, but a 1.5–2× sample expansion (e.g., loosening "week" to "two-week window") would tighten the strict spec. Suggested: an-NNN-spec3c-window-sensitivity.py.
No follow-up needed on the aggregate β identification. AN-111 closes the AN-110-prompted SE concern.