Per-poll detectability and blind-audit auditability of the +7 pp bias

🟢 The +7 pp bias is visible per-poll to an observer with sponsor information and an unbiased benchmark (AN-108: z ≈ 2.8 on the median sponsored poll against binomial SE) but buried for ~96 % of sponsored polls against the realized empirical cross-firm noise floor (AN-110: pooled DEFF* = 12.59 implies z ≈ 0.79). A calibrated blind audit nonetheless catches sponsored polls at 12.3 % vs 4.8 % on independent controls — a 2.6× lift — at protocol-level Bonferroni (AN-109 v2).

The detectability picture has three load-bearing comparators, each relevant to a different observer:

Reputation implication. A sponsor's own pollster running internal sanity checks against a tight benchmark (an unbiased prior, the true share at fielding, careful QA) would see the bias as statistically extreme. A pollster benchmarking against published competitor polls (the realized cross-firm distribution) would not. Reputation pressure is therefore conditional on firm-level QA practice and on whether firms hold an internal unbiased estimate of the true share that the published number is allowed to drift from.

Policy implication. A TSE / journalist blind audit using (a) empirical-DEFF-calibrated SE from the registered-polls universe, (b) Bonferroni / Holm correction per protocol's k, and (c) a protocol-level "any share extreme" rollup yields a 2.6× lift on sponsored vs independent polls. This is operationally deployable from existing public data. The instrument would be calibrated (≈ 5 % FP), target the tail of biased polls (≈ 12 % catch rate on sponsored), and support an aggregate excess-detection-rate audit by sponsorship class rather than only per-poll enforcement.

Scope: this finding is silent on the channel decomposition. Whether the +7 pp comes from lawful design tilt (Channel A) or from fabrication (Channel B) is not addressed here — a design tilt and a fabrication of the same magnitude have identical detectability properties. The channel question is tracked separately in channel-decomposition-open.

Caveats.

Sources.

Cited analyses

AN-108 green descriptive

Binomial SE on the sponsor candidate's share is ≤ 3.5 pp on 99.3 % of sponsored polls (median 2.49 pp at median n=360); +7 pp is z ≈ 2.8 against binomial SE on the median sponsored poll. Per-poll bias is statistically visible against a tight (binomial / unbiased) benchmark — informs the reputation / auditability story, not the channel decomposition. AN-110 walks this back against the empirical noise floor.

AN-110 green descriptive

Pooled empirical DEFF* = 12.59 across 673 race × week × candidate cells with ≥3 independent polls (median cell DEFF* = 3.93, P75 = 9.59); realized cross-poll SD on a candidate's share is ≈ 5.0 pp, well above the binomial + DEFF≤3 model. Explains AN-109's miscalibration and qualifies AN-108's per-poll-loud reading: against the empirical cross-firm noise floor (not the binomial benchmark), a +7 pp single-poll shift is ≈ 0.8 SDs — within the noise floor. Aggregate regression β survives because N=568 averages the noise out. Informative about reputation / auditability comparators, not about channel decomposition.

AN-109 green descriptive

**v2 (2026-06-19; supersedes):** calibrated at AN-110 pooled empirical DEFF\\* = 12.59, the blind audit reaches nominal FP (independent protocols 4.8 % at Bonferroni vs nominal 5 %); sponsored excess detection persists at **+7.5 pp Bonferroni / +11.9 pp single-hypothesis** — a calibrated regulator-facing blind audit distinguishes sponsored from independent at a 2.6× lift. The bias has tail concentration: only 4.3 % of sponsor's-candidate rows fail at the empirical DEFF (per-poll bias is buried for 95.7 %), but 12.3 % of sponsored protocols catch via 'at least one share extreme'. **v1 (binomial + DEFF ≤ 3):** the test was uncalibrated (FP 30–60 %); sponsored excess was +13 to +19 pp but the gap inflated by miscalibration.