AN-082: Poll accuracy by four-bucket sponsor class

Magnitude measures (mean |error|, RMSE, max boost, spread) come back NULL across all four sponsor buckets within race × week. The signal is rank-disagreement, not noise. other_firm polls understate the eventual race winner by −1.83 pp vs media polls (p = 0.010) and call the winner

Hypothesis: H13: Shell-contratante polls show larger residual β
Confidence: yellow
Type: causal

Design

Sample: matched-share-1.0 cand-poll → poll-level aggregate (7,032 polls)
Specification: Poll-level OLS, 7 outcomes: mean_abs_error mean |error| across cands in the poll rmse sqrt(mean(error²)) — L2 inaccuracy max_signed_error max(error) — biggest boost the poll handed out spread_error max(error) − min(error) — boost+understatement range poll_leader_error signed error of the cand the POLL ranked #1 winner_error signed error of the cand the ELECTION ranked #1 poll_calls_winner_first 1 if poll's #1 == election winner, 0 else Treatment: 4-bucket poll_class dummies (media reference). Spec ladder A: race FE; B: race × week FE; C: B + log(sample_size). Within-demean + cluster SE at the FE level.
Comparator: media-sponsored polls in the same race × week
Cluster: race / race × week

Script: source/analysis/an-082-accuracy-by-sponsor-bucket.py
Target: build/table/an-082-accuracy-by-sponsor-bucket.csv
Status: interpreted · 2026-06-17
Created: 2026-06-17

Motivated by the Goiás IPOP cross-cycle pattern (paper.tex §Setting, lines 265–281). 2020: 357 IPOP polls were self-contracted (the firm registered itself as contratante) — that's pollster_self in our taxonomy, and the 2020 polls were the fraudulent ones prosecuted in Operação Leão de Neméia. 2024: IPOP (68 polls) and Alcateia Outsourcing (41 polls) routed every Goiás 2024 mayoral poll through FacUnicamps, a private faculdade in Goiânia — that's other_firm in our taxonomy. The shell channel migrated across cycles in response to enforcement. A 4-bucket accuracy split therefore has to ask not just "does other_firm look slanted in 2024" but "does pollster_self look slanted too" and "what kind of slant — magnitude or direction."

H13 (shell-contratante) named the prediction and was queued as "small-N behind the LLM extractor." Protocol-level counts say no: other_firm is 1,553 polls (22 %) and pollster_self is 1,464 polls (21 %) — both larger than the candidate-linked bucket (1,026, 15 %). The four-bucket split is well-powered without the LLM pass.

Results

Raw cell means (no FE, all 7,032 polls)

Bucket	n	mean \|err\|	RMSE	max+	spread	poll-leader err	winner err	calls winner #1
media	2,948	8.15	9.03	10.63	18.80	+7.25	−0.16	73.2 %
other_firm	1,553	8.17	8.93	10.35	19.11	+7.39	−0.84	69.3 %
pollster_self	1,464	8.38	9.39	11.44	19.74	+7.58	−0.51	70.8 %
candidate	1,026	9.22	9.97	11.53	18.83	+9.12	+1.43	72.5 %

Two observations from the raw cells matter for picking the right measure:

The boost handed to the poll's #1 cand is similar across buckets (poll_leader_error ≈ 7.3–7.6 pp for everything except candidate- linked at 9.1). Every poll inflates whoever it puts on top by roughly the same amount. Magnitude-of-boost is not where the sponsor signal lives.
What differs is who that #1 cand is. winner_error ranges from −0.84 to +1.43 across buckets, and poll_calls_winner_first from 69 % to 73 %. The slant signature is "which candidate the poll points to," not "how big a boost the slant delivers."

This reframes the AN-082 search: the right measure is rank-based, not magnitude-based.

Race × Week FE (Spec B — preferred)

Coefficients on the bucket dummies vs media reference. The four magnitude outcomes are uniformly null:

Outcome	is_candidate	is_pollster_self	is_other_firm
mean_abs_error	+0.13 (p=0.84)	−0.09 (p=0.73)	+0.02 (p=0.95)
rmse	+0.25 (p=0.72)	−0.08 (p=0.80)	+0.06 (p=0.88)
max_signed_error	+0.50 (p=0.57)	+0.00 (p=1.00)	+0.23 (p=0.68)
spread_error	+0.92 (p=0.58)	−0.27 (p=0.71)	+0.41 (p=0.70)

The rank-based outcomes are where the signal is:

Outcome	is_candidate	is_pollster_self	is_other_firm
poll_leader_error	−1.65 (p=0.114)	−0.43 (p=0.40)	+0.22 (p=0.77)
winner_error	−2.26 (p=0.089)	+0.56 (p=0.30)	−1.83 (p=0.010)
poll_calls_winner_first	+0.013 (p=0.81)	+0.074 (p=0.009)	−0.048 (p=0.15)

Spec C adding log(sample_size) leaves these results essentially unchanged.

n=2,190 polls in 885 race × week cells for Spec B (cells with only one poll are dropped — no within-variation).

Interpretation

Magnitude doesn't move; direction does. None of the magnitude measures separates the four buckets within race × week. Shell-suspect polls are not noisier, don't have larger boosts, don't have wider spreads, don't have higher RMSE. Whatever "shell" means here, it doesn't show up as a degradation of poll quality.

The signal is rank-disagreement. Within race × week:

other_firm polls understate the eventual race winner by 1.83 pp vs media polls (p = 0.010) and are directionally 4.8 pp less likely to call the winner #1 (p = 0.15).
candidate-linked polls show the same anti-winner signature at similar point estimate (−2.26 pp on winner_error), with ~2× wider SE (1,026 polls, even thinner after race × week absorption).
pollster_self polls are 7.4 pp MORE likely to call the winner #1 than media polls within race × week (p = 0.009). That looks paradoxical given the 2020 Goiás IPOP precedent, but it's a 2024-sample-only result. Two readings:
- Selection — they get the easy races. Pollster_self firms self-finance their flagship polls in races where they're confident; within-race × week comparison rewards that. Raw-means call-the-winner rate (70.8 %) is below media (73.2 %) precisely because pollster_self polls cluster in tougher races. The FE flips the sign.
- The Goiás fraud channel is no longer pollster_self. After the 2020 prosecution, IPOP and Alcateia rerouted through FacUnicamps; the residual 2024 pollster_self pool is dominated by reputational firms doing showcase polls.

The other_firm result is the H13 signature. A hidden sponsor with sender-identity concealment buys the same product as a self-sponsoring candidate: point the poll at a non-leader. The average boost magnitude doesn't change — every poll inflates its #1 by ~7 pp — but the choice of #1 is different. Within race × week, that shows up as the eventual winner being understated. other_firm and candidate-linked share the signature; media and pollster_self don't.

How this changes the H13 reading

Earlier headlines from the v1 draft of AN-082 framed the signal as "shell polls are −1.8 pp more inaccurate." That overclaims. The corrected reading:

Inaccuracy is not the metric. Magnitude is null.
Direction is the metric. Shell-suspect polls misrank the eventual winner — they put a different cand on top, who turns out not to win.
The mass-on-which-candidate is the slant. The poll's rank-1 choice is the slant declaration; its identity differs across buckets even though its inflation magnitude doesn't.

Caveats

Race × week absorption is sharp. 6,991 polls in the four- bucket sample → 2,190 rows in Spec B because most cells have only one poll. Surviving cells are the well-polled, attention- competitive races. Race FE (Spec A) on the full 5,750 gives is_other_firm β=−0.63 on winner_error (p=0.24) — directionally identical, weaker.
other_firm is a heuristic regex residual (source/assemble/poll.py:_norm). False positives include civic associations and unions. The LLM pass (H13 to-do) would split (suspected shell / civic association / unknown) and should sharpen the signal, not weaken it — pure noise would attenuate the within-FE −1.83.
Cross-cycle channel migration is the load-bearing interpretive caveat. The 2024 result codes the shell channel as other_firm; the 2020 IPOP fraud was pollster_self. A 2020 re-run of AN-082 would show pollster_self carrying the anti- winner signature, and other_firm not. The bucket meaning is cycle-specific.
pollster_self +7.4 pp on poll_calls_winner_first is novel and not predicted. Two competing readings (selection-easy- races vs post-prosecution pool-purification) cannot be separated in this analysis. Worth a separate check via the per-firm β cross-cut from AN-016/017.

Follow-ups

LLM classification of the other_firm bucket (H13 data requirement). With the regex residual already showing −1.83 pp / p = 0.010 on winner_error, the LLM pass should let us split other_firm into (shell-suspect / civic association / unknown) and see whether the slant concentrates in the shell sub-bucket.
Cross-reference to the named 2024 Goiás case. Filter to IPOP + Alcateia CNPJs, confirm: (a) their 2024 polls are coded other_firm (FacUnicamps as contratante), (b) their poll-level winner_error magnitudes match the universe-wide −1.83.
Cross-cycle re-run on 2020 polls. Replicate AN-082 on 2020 data: the prediction is pollster_self carries the anti-winner signature there (where IPOP self-contracted), and other_firm is null. A confirming cross-cycle pattern would be the cleanest evidence that the bucket meaning is shifting with enforcement, not the underlying mechanism.
Race × month FE intermediate spec. Race × week drops 69 % of the sample. Race × month keeps more polls per cell and would clarify whether the rank-disagreement signal is robust to less aggressive timing absorption.
Wire to the paper. Currently §Setting frames the Goiás case as motivating the shell-sponsor caveat. AN-082 turns it into a quantified rank-disagreement signal (other_firm −1.83 pp anti-winner, p = 0.010). Worth one paragraph in §Results alongside the headline +7 pp, with the cross-cycle channel-migration framing.

Artifacts

Script: source/analysis/an-082-accuracy-by-sponsor-bucket.py
Spec-level coefficients: build/table/an-082-accuracy-by-sponsor-bucket.csv
Raw cell means: build/table/an-082-accuracy-by-sponsor-bucket__cells.csv
Headline JSON: build/table/an-082-accuracy-by-sponsor-bucket.json

H13 shell-contratante hypothesis
AN-006 sponsor-route split
AN-017 customer-mix refresh
Paper §Setting: Goiás IPOP cross-cycle pattern (2020 pollster_self / 2024 FacUnicamps shell), paper/paper.tex lines 265–281, citing jornalopcao2024lista.