Magnitude measures (mean |error|, RMSE, max boost, spread) come back NULL across all four sponsor buckets within race × week. The signal is rank-disagreement, not noise. other_firm polls understate the eventual race winner by −1.83 pp vs media polls (p = 0.010) and call the winner

Confidence
yellow
Type
causal
Design
Sample
matched-share-1.0 cand-poll → poll-level aggregate (7,032 polls)
Specification
Poll-level OLS, 7 outcomes: mean_abs_error mean |error| across cands in the poll rmse sqrt(mean(error²)) — L2 inaccuracy max_signed_error max(error) — biggest boost the poll handed out spread_error max(error) − min(error) — boost+understatement range poll_leader_error signed error of the cand the POLL ranked #1 winner_error signed error of the cand the ELECTION ranked #1 poll_calls_winner_first 1 if poll's #1 == election winner, 0 else Treatment: 4-bucket poll_class dummies (media reference). Spec ladder A: race FE; B: race × week FE; C: B + log(sample_size). Within-demean + cluster SE at the FE level.
Comparator
media-sponsored polls in the same race × week
Cluster
race / race × week
Script
source/analysis/an-082-accuracy-by-sponsor-bucket.py
Target
build/table/an-082-accuracy-by-sponsor-bucket.csv
Status
interpreted · 2026-06-17
Created
2026-06-17

Motivated by the Goiás IPOP cross-cycle pattern (paper.tex §Setting, lines 265–281). 2020: 357 IPOP polls were self-contracted (the firm registered itself as contratante) — that's pollster_self in our taxonomy, and the 2020 polls were the fraudulent ones prosecuted in Operação Leão de Neméia. 2024: IPOP (68 polls) and Alcateia Outsourcing (41 polls) routed every Goiás 2024 mayoral poll through FacUnicamps, a private faculdade in Goiânia — that's other_firm in our taxonomy. The shell channel migrated across cycles in response to enforcement. A 4-bucket accuracy split therefore has to ask not just "does other_firm look slanted in 2024" but "does pollster_self look slanted too" and "what kind of slant — magnitude or direction."

H13 (shell-contratante) named the prediction and was queued as "small-N behind the LLM extractor." Protocol-level counts say no: other_firm is 1,553 polls (22 %) and pollster_self is 1,464 polls (21 %) — both larger than the candidate-linked bucket (1,026, 15 %). The four-bucket split is well-powered without the LLM pass.

Results

Raw cell means (no FE, all 7,032 polls)

Bucket n mean |err| RMSE max+ spread poll-leader err winner err calls winner #1
media 2,948 8.15 9.03 10.63 18.80 +7.25 −0.16 73.2 %
other_firm 1,553 8.17 8.93 10.35 19.11 +7.39 −0.84 69.3 %
pollster_self 1,464 8.38 9.39 11.44 19.74 +7.58 −0.51 70.8 %
candidate 1,026 9.22 9.97 11.53 18.83 +9.12 +1.43 72.5 %

Two observations from the raw cells matter for picking the right measure:

  1. The boost handed to the poll's #1 cand is similar across buckets (poll_leader_error ≈ 7.3–7.6 pp for everything except candidate- linked at 9.1). Every poll inflates whoever it puts on top by roughly the same amount. Magnitude-of-boost is not where the sponsor signal lives.

  2. What differs is who that #1 cand is. winner_error ranges from −0.84 to +1.43 across buckets, and poll_calls_winner_first from 69 % to 73 %. The slant signature is "which candidate the poll points to," not "how big a boost the slant delivers."

This reframes the AN-082 search: the right measure is rank-based, not magnitude-based.

Race × Week FE (Spec B — preferred)

Coefficients on the bucket dummies vs media reference. The four magnitude outcomes are uniformly null:

Outcome is_candidate is_pollster_self is_other_firm
mean_abs_error +0.13 (p=0.84) −0.09 (p=0.73) +0.02 (p=0.95)
rmse +0.25 (p=0.72) −0.08 (p=0.80) +0.06 (p=0.88)
max_signed_error +0.50 (p=0.57) +0.00 (p=1.00) +0.23 (p=0.68)
spread_error +0.92 (p=0.58) −0.27 (p=0.71) +0.41 (p=0.70)

The rank-based outcomes are where the signal is:

Outcome is_candidate is_pollster_self is_other_firm
poll_leader_error −1.65 (p=0.114) −0.43 (p=0.40) +0.22 (p=0.77)
winner_error −2.26 (p=0.089) +0.56 (p=0.30) −1.83 (p=0.010)
poll_calls_winner_first +0.013 (p=0.81) +0.074 (p=0.009) −0.048 (p=0.15)

Spec C adding log(sample_size) leaves these results essentially unchanged.

n=2,190 polls in 885 race × week cells for Spec B (cells with only one poll are dropped — no within-variation).

Interpretation

Magnitude doesn't move; direction does. None of the magnitude measures separates the four buckets within race × week. Shell-suspect polls are not noisier, don't have larger boosts, don't have wider spreads, don't have higher RMSE. Whatever "shell" means here, it doesn't show up as a degradation of poll quality.

The signal is rank-disagreement. Within race × week:

The other_firm result is the H13 signature. A hidden sponsor with sender-identity concealment buys the same product as a self-sponsoring candidate: point the poll at a non-leader. The average boost magnitude doesn't change — every poll inflates its #1 by ~7 pp — but the choice of #1 is different. Within race × week, that shows up as the eventual winner being understated. other_firm and candidate-linked share the signature; media and pollster_self don't.

How this changes the H13 reading

Earlier headlines from the v1 draft of AN-082 framed the signal as "shell polls are −1.8 pp more inaccurate." That overclaims. The corrected reading:

Caveats

Follow-ups

  1. LLM classification of the other_firm bucket (H13 data requirement). With the regex residual already showing −1.83 pp / p = 0.010 on winner_error, the LLM pass should let us split other_firm into (shell-suspect / civic association / unknown) and see whether the slant concentrates in the shell sub-bucket.

  2. Cross-reference to the named 2024 Goiás case. Filter to IPOP + Alcateia CNPJs, confirm: (a) their 2024 polls are coded other_firm (FacUnicamps as contratante), (b) their poll-level winner_error magnitudes match the universe-wide −1.83.

  3. Cross-cycle re-run on 2020 polls. Replicate AN-082 on 2020 data: the prediction is pollster_self carries the anti-winner signature there (where IPOP self-contracted), and other_firm is null. A confirming cross-cycle pattern would be the cleanest evidence that the bucket meaning is shifting with enforcement, not the underlying mechanism.

  4. Race × month FE intermediate spec. Race × week drops 69 % of the sample. Race × month keeps more polls per cell and would clarify whether the rank-disagreement signal is robust to less aggressive timing absorption.

  5. Wire to the paper. Currently §Setting frames the Goiás case as motivating the shell-sponsor caveat. AN-082 turns it into a quantified rank-disagreement signal (other_firm −1.83 pp anti-winner, p = 0.010). Worth one paragraph in §Results alongside the headline +7 pp, with the cross-cycle channel-migration framing.

Artifacts