14 of the top-25 other_firm sponsors (668 of 914 polls, 73%) are probable shells with no journalism brand and no plausible non-poll business reason to commission 17–254 mayoral polls; 9 are real media misclassified by the regex (211 polls); 2 unclear (35 polls). The top single shell signal is VS Publicidade (254 polls, MS+SP, R$0 capital, no web footprint).
Question
The project's sponsor-type classifier (source/assemble/poll.py) bins
every sponsor row into one of {media, pollster_self, committee_, party,
party_name, individual, pollster_other, other_firm}. The first six are
either independent media-side polls (media, pollster_self) or candidate-
linked polls captured by Routes A–D (committee_, party, party_name,
individual). The seventh — other_firm — is the residual: a sponsor
CNPJ whose name doesn't hit the MEDIA, PARTY, or POLLSTER regex and
whose CNPJ doesn't match any candidate-classification route. This
covers 6,735 sponsor rows = 1,210 distinct CNPJs = 3,353 mayoral
protocols (22.5% of all 14,887 mayoral 2024 polls).
If shell contratantes are common — as the paper's §2 Goiás example documents for IPOP/FacUnicamps and Alcateia — they should land heavily in this tier. Who are these 1,210 entities, and how many are actually candidate-pass-through vehicles?
Design
Concentration in the residual tier is heavy: the top 25 CNPJs cover 27% of other_firm protocols (905 of 3,353) and the top 100 cover 47%. Auditing the top 25 is a high-leverage first pass.
For each, the audit combines:
| Signal | Source |
|---|---|
| Razão social, capital social, porte | pipelines/cnpj/build/clean/cnpj_202505.parquet (May-2025 snapshot) |
| Primary + secondary CNAEs | pipelines/cnpj/build/clean/cnpj_cnae_202505.parquet joined to CNAE descriptions table |
| # protocols, UFs, institutes hired, mean amount paid | pipelines/politica/build/clean/poll_sponsor_2024.parquet (the project's primary sponsor table) |
| Web-presence verification | Google + econodata + casadosdados spot-checks per entity (~5 min/entity) |
The classification rubric:
- PROBABLE_SHELL: no live news brand under the name, CNAE incongruent with editorial activity (or "Edição de jornais" / "Portais" declared but never observed in news searches), R$0 or single-digit-k capital, individual-MEI ownership, or polls concentrated in a state-cluster too broad to match any local outlet's editorial footprint.
- REAL_MEDIA: live news site / broadcaster with dated journalism + social presence + plausible local editorial reach — caught only as
other_firmbecause the project's MEDIA regex didn't include the brand-name token. Classifier-miss, not a shell. - LEGIT_NON_MEDIA: a real non-media business (e.g., a real-estate firm with a defensible reason to commission a single market poll); none surfaced in the top 25.
- UNCLEAR: brand exists but quantity or ownership-context complicates a clean call.
Results
Distribution across the top 25:
| Classification | Entities | Polls covered | % of top-25 polls |
|---|---|---|---|
| PROBABLE_SHELL | 14 | 668 | 73 % |
| REAL_MEDIA (classifier miss) | 9 | 211 | 23 % |
| UNCLEAR | 2 | 35 | 4 % |
The 668 PROBABLE_SHELL polls in the top 25 alone are 4.5 % of all 14,887 mayoral 2024 polls. Extrapolated to the full other_firm tier (top 25 = 27 % of the residual; assume similar shell rate in the tail = optimistic but bounding): the full shell footprint in 2024 could be 8-12 % of all mayoral polls.
Top-25 entities (ranked by polls)
| # | Entity | Polls | UFs | Cap. (R$) | Class | Read |
|---|---|---|---|---|---|---|
| 1 | VS PUBLICIDADE LTDA | 254 | MS,SP | 0 | SHELL | R$0 capital + no web presence + 254 polls cross-state. Strongest shell signal. |
| 2 | DINAMICA / FACUNICAMPS | 80 | GO | 100k | SHELL | Documented in paper §2 (IPOP cover). |
| 3 | ESTAÇÃO I ESTÚDIO CRIATIVO | 52 | MA,PI | 20k | SHELL | CNAE explicitly includes "filmes de campanha política". Founded 18 mo before cycle. |
| 4 | G S NEGREIROS | 51 | RN | 1k | SHELL | Bare individual-name CNPJ, ISP CNAE, solo-MEI buying 51 polls. |
| 5 | IMPRENSA 24H (Nelio Miguel) | 45 | 11 UFs | 40k | MEDIA | Live Aracaju news portal, daily journalism. Classifier miss. |
| 6 | ABC PUBLICIDADE | 29 | AM,BA,CE,GO,RO | 30k | SHELL | Zero web/news/directory hits for this CNPJ. 5 unrelated states. |
| 7 | EMPRESA PACOTILHA / O IMPARCIAL | 29 | MA | 785k | MEDIA | Real Maranhão newspaper (S.A.). Classifier miss. |
| 8 | TRES MARIAS EMPREENDIMENTOS | 27 | BA,SE | 130k | SHELL | Jewelry-wholesale CNAE + web design + agência de notícias mix. No editorial product. |
| 9 | DON7 MEDIA / CN7 | 27 | CE | 20k | MEDIA | Owns CN7 Ceará News + Plus FM. Real CE media group. |
| 10 | 45.366.955 GLEDSON LOPES SANTIAGO | 27 | RO | 5.5k | SHELL | MEI individual, 27 polls in RO. Data-publishing CNAE, not journalism. |
| 11 | NIVALDO GALINDO / N.R. ESTÚDIO | 26 | PE | 0 | SHELL | R$0 capital. No editorial product. |
| 12 | 29.135.406 RAMON MARGIOLLE | 25 | BA | 1k | SHELL | MEI individual, festas/eventos CNAE. |
| 13 | SX EMPREENDIMENTOS PE | 24 | BA,PE | 50k | SHELL | Motorcycle retail + cosméticos CNAE. No newspaper. |
| 14 | PROGRAMA DO RUBÃO | 22 | CE | 5k | MEDIA | Real Ceará politics outlet since 2014, YouTube + podcast. Classifier miss. |
| 15 | HYAGO CAVALCANTE / LOADING MARKETING | 20 | PB,PE | 5k | SHELL | MEI individual, no editorial brand. |
| 16 | PROGRAMADORA CANAL TCM | 19 | RN | 20k | MEDIA | Real Mossoró broadcaster (Ch. 10 since 2003). Classifier miss. |
| 17 | J. CÂMARA & IRMÃOS S/A | 18 | GO | 5M | MEDIA | Publisher of O Popular (Goiás). Real major regional. Classifier miss. |
| 18 | 35.883.673 JOSE VANDERLUCIO | 18 | PE,RN | 13k | SHELL | MEI individual, festas-eventos CNAE. |
| 19 | PE NEWS (Robson Ouro Preto) | 18 | PE | 20k | UNCLEAR | Site is live, but owner is partisan politician with "120-portal network" pattern. |
| 20 | ASSOC. MARKETING MG | 18 | MG | 0 | SHELL | R$0 capital, "Administração pública em geral" CNAE (same as FacUnicamps). |
| 21 | INNTEGRA MARKETING | 17 | SE | 50k | UNCLEAR | Real agency brand; 17 polls is high but not necessarily shell. |
| 22 | AC 24 HORAS | 17 | AC | 100k | MEDIA | Largest digital paper in Acre, 19 yr, 300k IG. Classifier miss. |
| 23 | CENTRO DE TREINAMENTO HUMANO | 17 | PI | 172k | MEDIA | Corporate vehicle of 180graus Piauí portal (same admin: Helder Eugênio). |
| 24 | FONTE 83 | 17 | PB | 20k | MEDIA | Paraibano political blog, journalist + UNINASSAU partnership. |
| 25 | DDD91 LTDA | 17 | PA | 300k | SHELL | Founded Sep 2023, no consumer-facing brand, cinematography CNAE. |
Full table with CNPJ, all CNAEs, mean and total amounts, and
classification evidence is in build/intermediate/other_firm_top25.csv.
Interpretation
Three structural findings.
The shell pattern is not just Goiás. The §2 paper case (IPOP → FacUnicamps) is one specific instance of a pattern present across at least MS, SP, MA, PI, RN, RO, PE, BA, MG, PA, SE, AM, GO, CE, AC, PB — every state we audited in the top 25 has at least one probable-shell sponsor. Brazil's poll-sponsor cover-up channel is national.
The MEI individual format is a structural sub-pattern. Six of the 14 SHELLs are individual-name CNPJs (
45.366.955 GLEDSON…,29.135.406 RAMON…, etc. — the leading 8 digits are the CNPJ-base, styled to read as a person registering as a microbusiness). These one-person businesses commissioned 17–51 mayoral polls each. No legitimate solo entrepreneur in a non-journalism CNAE has a plausible reason to commission that volume of mayoral polls.The MEDIA regex has a ~36 % false-negative rate in the top 25. Nine real media outlets (Imprensa 24h, O Imparcial, CN7, Programa do Rubão, Canal TCM, O Popular, ac24horas, 180graus-via-CTH, Fonte 83) were missed because the project's regex doesn't include tokens for: Pacotilha, Imparcial, CN7, Don7, Programa, TCM, Programadora, Câmara, 24 Horas, Fonte, Centro de Treinamento (the 180graus operator). Adding these — or better, replacing regex-on-name with CNPJ-side classification via CNAE codes — would materially improve the project's poll_is_independent flag.
The largest single shell (VS Publicidade, 254 polls) is bigger than the FacUnicamps + Alcateia case combined (109 polls). It registered as the contratante for half of all polls in MS in the residual tier. Its R$0 social capital and zero web footprint are the cleanest "this is not a real business" signal among the 25.
Follow-ups
Fix the classifier first (extension): rebuild
source/assemble/poll.py's sponsor-type classification to use the CNPJ-side CNAE codes (from pipelines/cnpj) as the primary signal, not regex-on-name. Add the 9 known-media tokens to the legacy regex as a stopgap. Re-run downstream; thepoll_is_independentflag affects the regression sample's comparator pool. Suggested script:source/analysis/an-094a-classifier-rebuild.py.Tail audit at the next 75 (extension): covers another 20 % of other_firm protocols (top 100 = 47 %). The pattern density may drop in the tail; lighter-touch audit (CNPJ + 1 web hit per CNPJ) is sufficient. Suggested script:
source/analysis/an-094b-tail-audit.py.Singleton CNPJs are 779 sponsors covering 779 protocols (blind spot): one-poll CNPJs are the long tail of the residual tier. Auditing them per-entity is infeasible (~1,300 minutes), but a quantitative pattern audit (e.g., distribution of capital social, days-since- registration, MEI share) could bound how much shell density is in the tail without individual entity verification.
VS Publicidade deep dive (puzzle): who is the beneficial owner, and which candidates' polls did their 254 MS+SP polls actually report on? A small-N case study on VS could anchor the paper's §2 alongside FacUnicamps.
Substantive paper update (extension): §2 currently grounds the iceberg in Goiás only. With AN-094 evidence, the framing could be widened to "this pattern is national" — but the §2 prose has to stay tight. One sentence after the Alcateia paragraph could suffice: "Of the 25 most-frequent non-media, non-pollster-self sponsors of 2024 mayoral polls, 14 show the same shell-contratante signature as FacUnicamps — across 12 states and 668 polls."
Caveats
- The top-25 audit is light-touch (CNPJ data + Google + econodata). Each PROBABLE_SHELL call rests on absence-of-web-presence + CNAE incongruity, not on demonstrating candidate funding directly. The inferential chain is "this entity has no other plausible reason to commission this volume of mayoral polls, so the polls are most likely candidate-pass-through". A definitive call would need banking records, contracts, or testimony — same evidence bar as Operação Leão de Neméia, not achievable from registry data.
- The two UNCLEAR cases (PE News, Inntegra) could go either way and shouldn't be counted in the shell footprint either direction.
- The "9 MEDIA classifier misses" finding is robust — these entities have live news brands with daily/weekly journalism, verified by working URLs.