id: an-099 hypothesis: shell-contratante headline: Consensus deviation (poll share vs median of other polls of same cand in same race within ±14 days) confirms the +7 pp signed slant: +5.82 to +6.59 pp under race + cand + firm FE (p<0.001) using unsponsored-only consensus pool. This separates the slant from "late campaign movement" noise (vs-final-results contains both). The |deviation| magnitude effect under race FE alone is +3.31 pp (vs +1.35 pp using |error|) — 2.5× tighter signal at the race-FE level. But under cand FE the magnitude effect still collapses to null — the natural per-candidate consensus-deviation variance is large enough to mask the slant. Net: consensus-deviation is a better real-time bias detector than vs-final-results, but the fundamental noise-floor argument (AN-098) is unchanged. type: alternative-outcome question: "Does the bias show up more clearly when accuracy is measured against other polls (consensus) rather than against the final election result?" tags: ["hyp:shell-contratante", "consensus-deviation", "alternative-outcome", "real-time-detection"] status: interpreted status_date: 2026-06-17 confidence: green created: 2026-06-17 script: source/analysis/an-099-consensus-deviation-as-outcome.py target: build/table/an-099-consensus-deviation-as-outcome.csv
AN-099: Consensus deviation as an alternative bias signal
User suggestion (2026-06-17): "Define error as difference from other polls instead of difference from election result. Maybe the bias would show up in such a measure of accuracy?"
The intuition: comparing each poll to its peers in the same race × window removes the "late campaign movement" noise that contaminates vs-final-results error. The natural decomposition:
error_{c,p} = (poll_share - true_latent_{p_date}) + (true_latent_{p_date} - final_share) = sampling + methodology + slant + late_movement
consensus_dev_{c,p} = (poll_share - true_latent_{p_date}) - small_noise = sampling + methodology + slant - small_noise
The consensus version drops the (true_latent − final_share) component, leaving only sampling + methodology + slant. If slant is the variable of interest, consensus-deviation is the cleaner signal.
Construction
For each cand-poll row, the consensus is the median of OTHER polls of the same candidate in the same race × {±14 days of this poll's field_end}. Two pool definitions:
| Pool | Description | n usable rows |
|---|---|---|
| ALL polls | All polls in window (including other sponsored) | 14,412 |
| UNSPONSORED-only | Only sponsored_by==0 polls in window | 14,298 |
Median (vs mean) for robustness to single-poll outliers.
The unsponsored-only pool is the cleaner reference because it doesn't let other sponsored polls drag the consensus toward the slant.
Raw means
| Sample | Pool | n | Mean signed deviation | Mean |deviation| | SD signed |
|---|---|---|---|---|---|
| Unsponsored | ALL | 14,231 | −0.01 | 5.64 | 9.30 |
| Sponsored | ALL | 181 | +6.75 | 9.89 | 10.90 |
| Unsponsored | UNSP-only | 14,132 | +0.08 | 5.60 | 9.27 |
| Sponsored | UNSP-only | 166 | +8.13 | 11.07 | 11.49 |
The sponsored-row signed deviation of +6.75 / +8.13 directly matches the +7 pp slant from AN-096 (vs-final-results error). The cleanest reference (unsponsored-only pool) gives +8.13 pp — slightly larger than the +7 pp from election-based regression.
Spec ladder (UNSPONSORED-only pool)
Cluster SE at race. Treatment = sponsored_by. Control = opponent_sponsored + log_sample.
Signed deviation (the slant)
| Spec | sponsored_by | opponent_sponsored | n |
|---|---|---|---|
| S0 No FE | +7.98*** (1.05) | −2.28*** (0.40) | 14,298 |
| S1 Race FE | +5.82*** (0.97) | −4.17*** (0.73) | 14,297 |
| S2 Race + Cand FE | +6.31*** (1.32) | −4.38*** (0.77) | 14,217 |
| S3 Race + Cand + Firm FE | +6.59*** (1.35) | −3.99*** (0.78) | 14,211 |
The signed slant is robust at +5.8 to +8.0 pp across every FE spec, consistent with the AN-096 vs-final-results result.
|deviation| (magnitude)
| Spec | sponsored_by | n |
|---|---|---|
| S0 No FE | +4.43*** (0.83) | 14,298 |
| S1 Race FE | +3.31*** (0.91) | 14,297 |
| S2 Race + Cand FE | +0.47 (1.64) | 14,217 |
| S3 Race + Cand + Firm FE | +0.86 (1.49) | 14,211 |
Race FE alone gives +3.31 pp — 2.5× the +1.35 pp from |error| (AN-096). But cand FE collapses the magnitude effect to null, same as the noise-floor argument predicts.
Spec ladder (ALL polls pool)
For robustness: same regression with the consensus computed from ALL polls in the window (including other sponsored polls).
Signed deviation
| Spec | sponsored_by | n |
|---|---|---|
| S0 No FE | +6.70*** (0.91) | 14,412 |
| S1 Race FE | +7.38*** (1.08) | 14,412 |
| S2 Race + Cand FE | +11.40*** (1.55) | 14,412 |
| S3 Race + Cand + Firm FE | +11.63*** (1.53) | 14,412 |
Note the S2 / S3 inflation to +11.4 / +11.6 pp. Reading: under cand × race FE, the sponsored polls are competing against a consensus that INCLUDES some other sponsored polls (dragging the median upward). Each individual sponsored poll then deviates from the inflated consensus by more — the coefficient overstates the slant. The unsponsored-only pool avoids this and gives a cleaner +6.59 pp at S3.
Comparison to error-based regression (AN-096)
| Spec | vs final results (|error|) | vs unsp-pool (|deviation|) | Δ |
|---|---|---|---|
| S0 No FE | +2.88 | +4.43 | +1.55 |
| S1 Race FE | +1.35 | +3.31 | +1.96 |
| S2 Race + Cand FE | −0.44 | +0.47 | +0.91 |
| S3 Race + Cand + Firm FE | −0.44 | +0.86 | +1.30 |
Consensus-deviation gives a 2.5× tighter magnitude signal under race FE (S1) — exactly the gain you'd expect from removing the "late campaign movement" noise component. But under cand FE the collapse pattern is similar: natural per-candidate variance absorbs the slant.
Interpretation
The consensus-deviation approach is a partial win. It:
Confirms the +7 pp signed slant is robust across both measurement frames (vs final results AND vs consensus). The bias is not an artifact of "late campaign movement."
Gives a tighter magnitude signal at the race-FE level. This is useful for: paper appendix robustness; real-time bias detection (no need to wait for the election); cleaner slant identification when election outcomes are noisy.
Does not break the noise-floor wall under cand FE. The collapse to null under candidate FE happens with both measures — natural per-cand variance > +7 pp slant in |measure| terms. The market-discipline argument (AN-098) stands.
For the paper:
- The consensus-deviation framing is useful for a §Discussion paragraph on bias detection without final outcomes. Cite AN-099 race-FE result (+3.31 pp on |deviation|, p<0.001) as evidence that the slant IS detectable in real time IF sponsor identity is known.
- The cand-FE null reinforces the AN-098 noise-floor argument: even the cleanest signal (consensus deviation) cannot discipline accuracy at the individual-poll level once per-cand variance is accounted for. Pre-election bias detection requires either (a) cross-cand pooling within race, or (b) sponsor identity disclosure.
Caveats
- Median consensus requires ≥1 peer poll in window. This drops 8,253 of 22,665 rows (≈36%). The surviving rows are highly-polled cands in well-covered races. Selection toward big-cand big-race comparisons.
- ±14 days window is arbitrary. ±7 days drops more rows; ±30 days dilutes the consensus with stale polls. The 14-day window matches the project's existing race × week spec convention.
- All-polls-pool S2 / S3 inflation (+11 pp) is a mechanical artifact of sponsored polls being in their own consensus pool. Use the unsponsored-only pool for cleanest interpretation.
- opponent_sponsored coefficient is significantly negative (−2 to −4 pp). The "symmetric mirror" of the slant: opponent-sponsored cands are systematically understated relative to consensus. Confirms the AN-096 finding on a cleaner reference.
- Consensus-deviation regression doesn't address the firm- market-discipline argument (AN-097, AN-098). Even with the cleanest available real-time signal, the slant is invisible under per-cand FE for typical polls. The market still can't discipline.
Follow-ups
- Reproduce AN-097 (β-tercile) using consensus-deviation outcome. Does the high-β tercile still show large |deviation| effects? Tests whether the AN-097 heterogeneity is robust to the measurement frame.
- Time window robustness: ±7 days, ±21 days, ±30 days.
- Real-time bias detection prototype: a sponsor flagged if their poll deviates from concurrent consensus by > k σ. Useful for a paper §Policy proposal on continuous monitoring.
Artifacts
- Script:
source/analysis/an-099-consensus-deviation-as-outcome.py - Coefficient table:
build/table/an-099-consensus-deviation-as-outcome.csv - JSON:
build/table/an-099-consensus-deviation-as-outcome.json