14 of the top-25 other_firm sponsors (668 of 914 polls, 73%) are probable shells with no journalism brand and no plausible non-poll business reason to commission 17–254 mayoral polls; 9 are real media misclassified by the regex (211 polls); 2 unclear (35 polls). The top single shell signal is VS Publicidade (254 polls, MS+SP, R$0 capital, no web footprint).

Confidence
yellow
Type
descriptive
Design
Sample
top 25 distinct CNPJ sponsors in the other_firm tier (by
Specification
1. Re-run the 4-way classifier (Routes A-D / media / pollster_self / other_firm) from source/assemble/poll.py on every sponsor row in poll_sponsor_2024.parquet to recover the residual tier. 2. Rank other_firm sponsors by # distinct protocols; take top 25. 3. Enrich each with the May-2025 CNPJ snapshot from pipelines/cnpj/build/clean/: razão social, capital social, porte, primary + secondary CNAE codes, CNAE descriptions. 4. Manual classification of each into {PROBABLE_SHELL, REAL_MEDIA (classifier miss), LEGIT_NON_MEDIA, UNCLEAR} using: (a) CNAE alignment with editorial/journalism activity, (b) capital social vs poll volume, (c) cross-state poll spread, (d) web presence audit (live news brand, recent dated journalism, social media activity), (e) ownership structure (MEI/individual vs LTDA/S.A.), (f) per-poll price.
Comparator
project's existing 4-way classifier (regex-based on sponsor_name)
Script
source/analysis/an-094-other-firm-top25-shell-audit.py
Target
build/intermediate/other_firm_top25.csv
Status
interpreted · 2026-06-17
Created
2026-06-17

Question

The project's sponsor-type classifier (source/assemble/poll.py) bins every sponsor row into one of {media, pollster_self, committee_, party, party_name, individual, pollster_other, other_firm}. The first six are either independent media-side polls (media, pollster_self) or candidate- linked polls captured by Routes A–D (committee_, party, party_name, individual). The seventh — other_firm — is the residual: a sponsor CNPJ whose name doesn't hit the MEDIA, PARTY, or POLLSTER regex and whose CNPJ doesn't match any candidate-classification route. This covers 6,735 sponsor rows = 1,210 distinct CNPJs = 3,353 mayoral protocols (22.5% of all 14,887 mayoral 2024 polls).

If shell contratantes are common — as the paper's §2 Goiás example documents for IPOP/FacUnicamps and Alcateia — they should land heavily in this tier. Who are these 1,210 entities, and how many are actually candidate-pass-through vehicles?

Design

Concentration in the residual tier is heavy: the top 25 CNPJs cover 27% of other_firm protocols (905 of 3,353) and the top 100 cover 47%. Auditing the top 25 is a high-leverage first pass.

For each, the audit combines:

Signal Source
Razão social, capital social, porte pipelines/cnpj/build/clean/cnpj_202505.parquet (May-2025 snapshot)
Primary + secondary CNAEs pipelines/cnpj/build/clean/cnpj_cnae_202505.parquet joined to CNAE descriptions table
# protocols, UFs, institutes hired, mean amount paid pipelines/politica/build/clean/poll_sponsor_2024.parquet (the project's primary sponsor table)
Web-presence verification Google + econodata + casadosdados spot-checks per entity (~5 min/entity)

The classification rubric:

Results

Distribution across the top 25:

Classification Entities Polls covered % of top-25 polls
PROBABLE_SHELL 14 668 73 %
REAL_MEDIA (classifier miss) 9 211 23 %
UNCLEAR 2 35 4 %

The 668 PROBABLE_SHELL polls in the top 25 alone are 4.5 % of all 14,887 mayoral 2024 polls. Extrapolated to the full other_firm tier (top 25 = 27 % of the residual; assume similar shell rate in the tail = optimistic but bounding): the full shell footprint in 2024 could be 8-12 % of all mayoral polls.

Top-25 entities (ranked by polls)

# Entity Polls UFs Cap. (R$) Class Read
1 VS PUBLICIDADE LTDA 254 MS,SP 0 SHELL R$0 capital + no web presence + 254 polls cross-state. Strongest shell signal.
2 DINAMICA / FACUNICAMPS 80 GO 100k SHELL Documented in paper §2 (IPOP cover).
3 ESTAÇÃO I ESTÚDIO CRIATIVO 52 MA,PI 20k SHELL CNAE explicitly includes "filmes de campanha política". Founded 18 mo before cycle.
4 G S NEGREIROS 51 RN 1k SHELL Bare individual-name CNPJ, ISP CNAE, solo-MEI buying 51 polls.
5 IMPRENSA 24H (Nelio Miguel) 45 11 UFs 40k MEDIA Live Aracaju news portal, daily journalism. Classifier miss.
6 ABC PUBLICIDADE 29 AM,BA,CE,GO,RO 30k SHELL Zero web/news/directory hits for this CNPJ. 5 unrelated states.
7 EMPRESA PACOTILHA / O IMPARCIAL 29 MA 785k MEDIA Real Maranhão newspaper (S.A.). Classifier miss.
8 TRES MARIAS EMPREENDIMENTOS 27 BA,SE 130k SHELL Jewelry-wholesale CNAE + web design + agência de notícias mix. No editorial product.
9 DON7 MEDIA / CN7 27 CE 20k MEDIA Owns CN7 Ceará News + Plus FM. Real CE media group.
10 45.366.955 GLEDSON LOPES SANTIAGO 27 RO 5.5k SHELL MEI individual, 27 polls in RO. Data-publishing CNAE, not journalism.
11 NIVALDO GALINDO / N.R. ESTÚDIO 26 PE 0 SHELL R$0 capital. No editorial product.
12 29.135.406 RAMON MARGIOLLE 25 BA 1k SHELL MEI individual, festas/eventos CNAE.
13 SX EMPREENDIMENTOS PE 24 BA,PE 50k SHELL Motorcycle retail + cosméticos CNAE. No newspaper.
14 PROGRAMA DO RUBÃO 22 CE 5k MEDIA Real Ceará politics outlet since 2014, YouTube + podcast. Classifier miss.
15 HYAGO CAVALCANTE / LOADING MARKETING 20 PB,PE 5k SHELL MEI individual, no editorial brand.
16 PROGRAMADORA CANAL TCM 19 RN 20k MEDIA Real Mossoró broadcaster (Ch. 10 since 2003). Classifier miss.
17 J. CÂMARA & IRMÃOS S/A 18 GO 5M MEDIA Publisher of O Popular (Goiás). Real major regional. Classifier miss.
18 35.883.673 JOSE VANDERLUCIO 18 PE,RN 13k SHELL MEI individual, festas-eventos CNAE.
19 PE NEWS (Robson Ouro Preto) 18 PE 20k UNCLEAR Site is live, but owner is partisan politician with "120-portal network" pattern.
20 ASSOC. MARKETING MG 18 MG 0 SHELL R$0 capital, "Administração pública em geral" CNAE (same as FacUnicamps).
21 INNTEGRA MARKETING 17 SE 50k UNCLEAR Real agency brand; 17 polls is high but not necessarily shell.
22 AC 24 HORAS 17 AC 100k MEDIA Largest digital paper in Acre, 19 yr, 300k IG. Classifier miss.
23 CENTRO DE TREINAMENTO HUMANO 17 PI 172k MEDIA Corporate vehicle of 180graus Piauí portal (same admin: Helder Eugênio).
24 FONTE 83 17 PB 20k MEDIA Paraibano political blog, journalist + UNINASSAU partnership.
25 DDD91 LTDA 17 PA 300k SHELL Founded Sep 2023, no consumer-facing brand, cinematography CNAE.

Full table with CNPJ, all CNAEs, mean and total amounts, and classification evidence is in build/intermediate/other_firm_top25.csv.

Interpretation

Three structural findings.

  1. The shell pattern is not just Goiás. The §2 paper case (IPOP → FacUnicamps) is one specific instance of a pattern present across at least MS, SP, MA, PI, RN, RO, PE, BA, MG, PA, SE, AM, GO, CE, AC, PB — every state we audited in the top 25 has at least one probable-shell sponsor. Brazil's poll-sponsor cover-up channel is national.

  2. The MEI individual format is a structural sub-pattern. Six of the 14 SHELLs are individual-name CNPJs (45.366.955 GLEDSON…, 29.135.406 RAMON…, etc. — the leading 8 digits are the CNPJ-base, styled to read as a person registering as a microbusiness). These one-person businesses commissioned 17–51 mayoral polls each. No legitimate solo entrepreneur in a non-journalism CNAE has a plausible reason to commission that volume of mayoral polls.

  3. The MEDIA regex has a ~36 % false-negative rate in the top 25. Nine real media outlets (Imprensa 24h, O Imparcial, CN7, Programa do Rubão, Canal TCM, O Popular, ac24horas, 180graus-via-CTH, Fonte 83) were missed because the project's regex doesn't include tokens for: Pacotilha, Imparcial, CN7, Don7, Programa, TCM, Programadora, Câmara, 24 Horas, Fonte, Centro de Treinamento (the 180graus operator). Adding these — or better, replacing regex-on-name with CNPJ-side classification via CNAE codes — would materially improve the project's poll_is_independent flag.

The largest single shell (VS Publicidade, 254 polls) is bigger than the FacUnicamps + Alcateia case combined (109 polls). It registered as the contratante for half of all polls in MS in the residual tier. Its R$0 social capital and zero web footprint are the cleanest "this is not a real business" signal among the 25.

Follow-ups

  1. Fix the classifier first (extension): rebuild source/assemble/poll.py's sponsor-type classification to use the CNPJ-side CNAE codes (from pipelines/cnpj) as the primary signal, not regex-on-name. Add the 9 known-media tokens to the legacy regex as a stopgap. Re-run downstream; the poll_is_independent flag affects the regression sample's comparator pool. Suggested script: source/analysis/an-094a-classifier-rebuild.py.

  2. Tail audit at the next 75 (extension): covers another 20 % of other_firm protocols (top 100 = 47 %). The pattern density may drop in the tail; lighter-touch audit (CNPJ + 1 web hit per CNPJ) is sufficient. Suggested script: source/analysis/an-094b-tail-audit.py.

  3. Singleton CNPJs are 779 sponsors covering 779 protocols (blind spot): one-poll CNPJs are the long tail of the residual tier. Auditing them per-entity is infeasible (~1,300 minutes), but a quantitative pattern audit (e.g., distribution of capital social, days-since- registration, MEI share) could bound how much shell density is in the tail without individual entity verification.

  4. VS Publicidade deep dive (puzzle): who is the beneficial owner, and which candidates' polls did their 254 MS+SP polls actually report on? A small-N case study on VS could anchor the paper's §2 alongside FacUnicamps.

  5. Substantive paper update (extension): §2 currently grounds the iceberg in Goiás only. With AN-094 evidence, the framing could be widened to "this pattern is national" — but the §2 prose has to stay tight. One sentence after the Alcateia paragraph could suffice: "Of the 25 most-frequent non-media, non-pollster-self sponsors of 2024 mayoral polls, 14 show the same shell-contratante signature as FacUnicamps — across 12 states and 668 polls."

Caveats