id: an-099 hypothesis: shell-contratante headline: Consensus deviation (poll share vs median of other polls of same cand in same race within ±14 days) confirms the +7 pp signed slant: +5.82 to +6.59 pp under race + cand + firm FE (p<0.001) using unsponsored-only consensus pool. This separates the slant from "late campaign movement" noise (vs-final-results contains both). The |deviation| magnitude effect under race FE alone is +3.31 pp (vs +1.35 pp using |error|) — 2.5× tighter signal at the race-FE level. But under cand FE the magnitude effect still collapses to null — the natural per-candidate consensus-deviation variance is large enough to mask the slant. Net: consensus-deviation is a better real-time bias detector than vs-final-results, but the fundamental noise-floor argument (AN-098) is unchanged. type: alternative-outcome question: "Does the bias show up more clearly when accuracy is measured against other polls (consensus) rather than against the final election result?" tags: ["hyp:shell-contratante", "consensus-deviation", "alternative-outcome", "real-time-detection"] status: interpreted status_date: 2026-06-17 confidence: green created: 2026-06-17 script: source/analysis/an-099-consensus-deviation-as-outcome.py target: build/table/an-099-consensus-deviation-as-outcome.csv

AN-099: Consensus deviation as an alternative bias signal

User suggestion (2026-06-17): "Define error as difference from other polls instead of difference from election result. Maybe the bias would show up in such a measure of accuracy?"

The intuition: comparing each poll to its peers in the same race × window removes the "late campaign movement" noise that contaminates vs-final-results error. The natural decomposition:

error_{c,p} = (poll_share - true_latent_{p_date}) + (true_latent_{p_date} - final_share) = sampling + methodology + slant + late_movement

consensus_dev_{c,p} = (poll_share - true_latent_{p_date}) - small_noise = sampling + methodology + slant - small_noise

The consensus version drops the (true_latent − final_share) component, leaving only sampling + methodology + slant. If slant is the variable of interest, consensus-deviation is the cleaner signal.

Construction

For each cand-poll row, the consensus is the median of OTHER polls of the same candidate in the same race × {±14 days of this poll's field_end}. Two pool definitions:

Pool Description n usable rows
ALL polls All polls in window (including other sponsored) 14,412
UNSPONSORED-only Only sponsored_by==0 polls in window 14,298

Median (vs mean) for robustness to single-poll outliers.

The unsponsored-only pool is the cleaner reference because it doesn't let other sponsored polls drag the consensus toward the slant.

Raw means

Sample Pool n Mean signed deviation Mean |deviation| SD signed
Unsponsored ALL 14,231 −0.01 5.64 9.30
Sponsored ALL 181 +6.75 9.89 10.90
Unsponsored UNSP-only 14,132 +0.08 5.60 9.27
Sponsored UNSP-only 166 +8.13 11.07 11.49

The sponsored-row signed deviation of +6.75 / +8.13 directly matches the +7 pp slant from AN-096 (vs-final-results error). The cleanest reference (unsponsored-only pool) gives +8.13 pp — slightly larger than the +7 pp from election-based regression.

Spec ladder (UNSPONSORED-only pool)

Cluster SE at race. Treatment = sponsored_by. Control = opponent_sponsored + log_sample.

Signed deviation (the slant)

Spec sponsored_by opponent_sponsored n
S0 No FE +7.98*** (1.05) −2.28*** (0.40) 14,298
S1 Race FE +5.82*** (0.97) −4.17*** (0.73) 14,297
S2 Race + Cand FE +6.31*** (1.32) −4.38*** (0.77) 14,217
S3 Race + Cand + Firm FE +6.59*** (1.35) −3.99*** (0.78) 14,211

The signed slant is robust at +5.8 to +8.0 pp across every FE spec, consistent with the AN-096 vs-final-results result.

|deviation| (magnitude)

Spec sponsored_by n
S0 No FE +4.43*** (0.83) 14,298
S1 Race FE +3.31*** (0.91) 14,297
S2 Race + Cand FE +0.47 (1.64) 14,217
S3 Race + Cand + Firm FE +0.86 (1.49) 14,211

Race FE alone gives +3.31 pp — 2.5× the +1.35 pp from |error| (AN-096). But cand FE collapses the magnitude effect to null, same as the noise-floor argument predicts.

Spec ladder (ALL polls pool)

For robustness: same regression with the consensus computed from ALL polls in the window (including other sponsored polls).

Signed deviation

Spec sponsored_by n
S0 No FE +6.70*** (0.91) 14,412
S1 Race FE +7.38*** (1.08) 14,412
S2 Race + Cand FE +11.40*** (1.55) 14,412
S3 Race + Cand + Firm FE +11.63*** (1.53) 14,412

Note the S2 / S3 inflation to +11.4 / +11.6 pp. Reading: under cand × race FE, the sponsored polls are competing against a consensus that INCLUDES some other sponsored polls (dragging the median upward). Each individual sponsored poll then deviates from the inflated consensus by more — the coefficient overstates the slant. The unsponsored-only pool avoids this and gives a cleaner +6.59 pp at S3.

Comparison to error-based regression (AN-096)

Spec vs final results (|error|) vs unsp-pool (|deviation|) Δ
S0 No FE +2.88 +4.43 +1.55
S1 Race FE +1.35 +3.31 +1.96
S2 Race + Cand FE −0.44 +0.47 +0.91
S3 Race + Cand + Firm FE −0.44 +0.86 +1.30

Consensus-deviation gives a 2.5× tighter magnitude signal under race FE (S1) — exactly the gain you'd expect from removing the "late campaign movement" noise component. But under cand FE the collapse pattern is similar: natural per-candidate variance absorbs the slant.

Interpretation

The consensus-deviation approach is a partial win. It:

  1. Confirms the +7 pp signed slant is robust across both measurement frames (vs final results AND vs consensus). The bias is not an artifact of "late campaign movement."

  2. Gives a tighter magnitude signal at the race-FE level. This is useful for: paper appendix robustness; real-time bias detection (no need to wait for the election); cleaner slant identification when election outcomes are noisy.

  3. Does not break the noise-floor wall under cand FE. The collapse to null under candidate FE happens with both measures — natural per-cand variance > +7 pp slant in |measure| terms. The market-discipline argument (AN-098) stands.

For the paper:

Caveats

Follow-ups

  1. Reproduce AN-097 (β-tercile) using consensus-deviation outcome. Does the high-β tercile still show large |deviation| effects? Tests whether the AN-097 heterogeneity is robust to the measurement frame.
  2. Time window robustness: ±7 days, ±21 days, ±30 days.
  3. Real-time bias detection prototype: a sponsor flagged if their poll deviates from concurrent consensus by > k σ. Useful for a paper §Policy proposal on continuous monitoring.

Artifacts