AN-083: IPOP makes no observable predictions

IPOP and Alcateia (the two firms named in paper.tex §Setting as 2024 Goiás shell operators) systematically register polls without publishing usable relatórios. Alcateia uploaded 0 of 41 relatório PDFs to TSE; IPOP uploaded 17 of 68 (25 %), and ALL 17 contain methodology + demographic tables but ZERO vote-intention data. Mainstream comparator INSTITUTO GAZETA uploaded 121 of 145 (83 %), and the 8 we LLM-extracted have 100 % vote-intention coverage (3.4 scenarios × 21.6 cands each on average). The accuracy comparison the user asked for cannot be computed — IPOP's polls produce no observable predictions in the disclosure regime.

Hypothesis: H13: Shell-contratante polls show larger residual β
Confidence: green
Type: descriptive

Design

Sample: 68 IPOP + 41 Alcateia + 145 INSTITUTO GAZETA polls registered in GO 2024
Specification: Three-stage funnel: (1) registry-based count of polls per firm from pipelines/politica/build/clean/poll_2024.parquet; (2) PDF availability via streaming inspection of GO.tar.zst on bi-dropbox-ro:data/TSE/2024/pesquisa_eleitoral/relatorios/; (3) vote-intention disclosure via LLM extraction on the 17 IPOP + 8 INSTITUTO GAZETA PDFs available.
Comparator: GO baseline + INSTITUTO GAZETA (in-state mainstream firm)

Script: source/analysis/an-083-ipop-relatorio-content.py
Target: build/table/an-083-ipop-relatorio-content.csv
Status: interpreted · 2026-06-17
Created: 2026-06-17

User question (2026-06-17): "Can we check if IPOP made worse predictions compared to other pollsters or compared to polls sponsored by media?"

Answer: The standard accuracy comparison can't be computed for IPOP because IPOP's polls produce no observable predictions in the TSE disclosure regime. This is itself the finding.

Results

Disclosure funnel — three named firms vs GO universe

Firm	Registered	PDFs on bi-dropbox	PDFs with vote intentions
GO universe	1,305	1,031 (79 %)	—
INSTITUTO GAZETA (mainstream control)	145	121 (83 %)	8 of 8 sampled (100 %)
IPOP CIDADES E NEGOCIOS	68	17 (25 %)	0 of 17 (0 %)
ALCATEIA OUTSOURCING	41	0 (0 %)	— (none to extract)

What's actually in IPOP's 17 available PDFs

Hand inspection of GO016032024 (São Luís de Montes Belos): 1,515 lines of methodology + sample stratification tables (sex, age, income, education repeated across geographic strata) and sponsor data (FacUnicamps, R$ 2,500). Zero vote intention tables. Zero candidate names. The "Prefeito" field is the only string matching voto|intenç[aã]o|candidato|prefeit|estimulado|espont[âa]neo in the document, and it's the cargo-field label on the registration card at top.

Quantitative confirmation across all 17 IPOP PDFs: vote-related string matches range 1–1 (exactly 1 per PDF, the cargo label). Mainstream control (INSTITUTO GAZETA) on 8 sampled PDFs: 12–53 matches per PDF, mean 26.

LLM extraction confirmation

GPT-4o-mini run on all 17 IPOP PDFs returns scenarios: [] for every single one, with extraction notes saying: "The PDF does not contain any vote intention results for candidates. Only methodological details and sample demographics are provided."

Same extractor on the 8 INSTITUTO GAZETA PDFs returns mean 3.4 scenarios × mean 21.6 candidates per scenario — uniformly populated.

Interpretation

Three nested ways the firm evades observability:

Sponsor channel. FacUnicamps (a private faculdade) is the contratante and pagante for every IPOP 2024 poll, decoupling the registration footprint from any candidate / party CNPJ that the project's Routes A–D can trace.
Relatório upload. Only 25 % of IPOP's registered polls have a PDF uploaded to bi-dropbox at all (vs 79 % universe baseline, 83 % for the mainstream GO comparator). Three- quarters never enter the relatório system.
Relatório content. The 17 PDFs IPOP did upload are methodology + demographic tables only. None contain vote intention results. The disclosure regime has no record of what IPOP claimed about candidates in those races.

The three layers compound. Any one of these defeats the project's accuracy infrastructure independently:

The FacUnicamps sponsor channel means AN-082's other_firm classification is the only place these polls can show up (no candidate CPF / committee CNPJ).
The PDF non-upload means the vote intention extraction sees 0 of 51 IPOP polls.
The vote-intention-omission in the 17 PDFs that do exist means the extraction sees 0 of those 17 either.

After all three filters, the project's cand_poll.parquet contains 0 / 68 IPOP 2024 polls. We cannot compute IPOP's mean |error|, leader-rank error, or anything that requires knowing what the poll claimed.

Alcateia is the same case, only stricter. Of its 41 registered polls, 0 PDFs are uploaded. No content audit needed to conclude the firm is unobservable.

What this implies for the headline of AN-082

AN-082 found other_firm polls understate the eventual winner by −1.83 pp within race × week (p = 0.010). That regression is on the 1,553 other_firm polls in the project's sample — and IPOP + Alcateia together contribute zero of those 1,553 protocols. The AN-082 signal is measuring the visible 21 % of the shell pattern; the IPOP-Alcateia core of the shell pattern is literally invisible to the regression. Whether the full shell universe has β at the AN-082 magnitude, double it, or zero, the sample we observed cannot tell us.

Implications for the iceberg framing

The paper's iceberg framing (memory: "Both evade Routes A–D. Load-bearing §2 example for the iceberg framing.") is reinforced by this finding but extended: the iceberg has not three but three nested layers of evasion, of which Routes A–D's tracing failure is only the first. The deeper two — relatório non-upload and relatório content-emptiness — are not addressed anywhere in the existing literature on Brazilian poll disclosure to my knowledge. The disclosure regime nominally requires the relatório; in practice the enforcement gap is content-level, not just registration-level.

Caveats

n = 17 IPOP PDFs is small for the within-firm vote- intention disclosure rate, but the rate is 0 / 17 with zero variance — there's no sampling uncertainty about whether the next IPOP PDF will also be content-empty. The pattern is structural.
The LLM might be wrong about extraction-empty PDFs. Hand inspection of GO016032024 confirms the LLM is correct: there really are no vote tables in the document. Same hand inspection on the 8 INSTITUTO GAZETA PDFs confirms vote tables are present (12–53 vote-related lines each).
IPOP may have published results elsewhere (news media, the firm's own website, sponsor circulation). The finding is about TSE-registered disclosure specifically. From the regulatory-design perspective the gap is in TSE enforcement, not in IPOP's ability to communicate to clients — the contracted-out result presumably reaches the sponsor in some form, just not the TSE.
The 8-PDF INSTITUTO GAZETA sample is opportunistic (first 8 of 145 by protocol number). A representative sample might show some PDFs without vote intentions for legitimate reasons (small samples not stratifiable, pre-decision wave designs). But the 100 % rate on the 8 sampled vs 0 % rate on 17 IPOP PDFs is the relevant contrast; the false-positive rate on GAZETA can plausibly be 0–20 % without changing the comparison.

Follow-ups

Cross-check Alcateia by pulling its scrape directly from the TSE divulgação portal. bi-dropbox's 0 / 41 might be a scrape gap; the TSE portal itself might have the PDFs. If the TSE portal also has 0, the finding is that Alcateia evades upload entirely. If TSE has them and bi-dropbox missed them, we need to re-scrape and re-audit.
Audit the 274 GO polls (≈21 %) with no PDF on bi-dropbox by firm. Is the rest of the missingness concentrated in a handful of firms with the same shell pattern, or is it evenly distributed across the GO pollster universe? If concentrated, expand the AN-083 frame to a list of "non-disclosing firms."
Replicate on 2020 IPOP. The 2020 Operação Leão de Neméia case had 357 polls. Do the 2020 IPOP polls show the same relatório-content-emptiness pattern, or did the firm upload real (fabricated) numbers in 2020 and switch to content- emptiness after the prosecution? The 2020 vs 2024 comparison would tell us whether content-emptiness is a defensive adaptation specifically in response to enforcement.
Generalize beyond Goiás. Are there firms in other UFs with similar disclosure-failure patterns? An automated audit running the same funnel (registered → PDF on bi-dropbox → vote-intention extraction rate per firm) on the full universe would surface comparable cases without manual curation.
Wire to the paper. Currently §Setting names the IPOP 2024 case. AN-083 turns "FacUnicamps as cover sponsor" into a three-layer disclosure-evasion finding with hard numbers: 25 % PDF upload, 0 % vote-intention content. Worth a half-page in §Setting with the three-layer table.

Artifacts

Script: source/analysis/an-083-ipop-relatorio-content.py
Per-firm funnel: build/table/an-083-ipop-relatorio-content.csv
Headline JSON: build/table/an-083-ipop-relatorio-content.json
Intermediate PDF stash: build/intermediate/ipop_check/GO.tar.zst (987 MB; can be re-pulled from bi-dropbox-ro any time, kept locally for the follow-ups above)

H13 shell-contratante hypothesis
AN-082 four-bucket accuracy split (the AN-082 other_firm signal does NOT include IPOP / Alcateia — they're filtered out before the regression sample is built)
Paper §Setting: Goiás IPOP cross-cycle pattern, paper/paper.tex lines 265–281, citing jornalopcao2024lista.
Memory: project_poll_sponsor_bias_ipop_cross_cycle — cross-cycle channel pattern (2020 pollster_self, 2024 FacUnicamps shell). AN-083 adds: 2024 channel also evades the relatório-content layer.