Of 14,887 mayoral 2024 polls, **12.4% are routed through cover vehicles** (shell CNPJs 7.7% + MEI-individual 4.7%) that obscure the candidate's connection; 84.9% are administratively recoverable as candidate-linked, media, or pollster-self; the remaining 2.8% are an *uncoded residual* — low-volume CNPJs (1–4 polls each) that the classifier cannot place, many of which are likely sub-threshold cover vehicles (small publicity firms, missed MEI, missed local media). The cover-vehicle share GREW from 3.8% in 2020 to 12.4% in 2024 — a 3.3× increase; the uncoded residual SHRANK from 6.4% to 2.8%, consistent with cover-vehicle activity consolidating into identifiable shell/MEI patterns. 15 pollster firms switched from pollster_self in 2020 to shell or MEI-individual in 2024 — the IPOP pattern at scale, covering 508 polls in 2024 alone. Universe-extended shell list: 89 CNPJs / 1,149 polls (vs AN-094 top-25 audit: 14 CNPJs / 668 polls). Calibration: rule recovers 9 of the 13 AN-094 PROBABLE_SHELLs present in 2024 (69% recall); the 4 misses are firms the source/assemble/poll.py CNAE upgrade promotes to media or pollster_other, so the 89-CNPJ / 1,149-poll count is a precision-favoring FLOOR on the universe shell footprint.
Question
Paper v2 intro ¶3 quotes "at least 668 mayoral polls across 15 states in 2024 were registered through third-party shell CNPJs" — but this number comes from the AN-094 top-25 hand audit only, covering 905 of the 3,353 other_firm protocols (27 %). The intro's iceberg framing needs the universe number, not the top-25 floor.
This analysis extends the AN-094/095/097 shell classification to all 14,887 mayoral 2024 protocols (plus 10,971 in 2020 for cross-cycle comparison): what share of registered polls have an administratively recoverable candidate sponsor (Routes A-D, media, or pollster-self), and what share are routed through cover vehicles (shell CNPJs + MEI-individual entities)? And: how many pollster firms switched cover route between 2020 and 2024 — the IPOP pattern (self-contracted in 2020 → routed through FacUnicamps shell in 2024) at scale?
Design
source/analysis/an-121-iceberg-universe.py:
- Re-apply the 4-way classifier from
source/assemble/poll.py(classify_sponsor_row) to every sponsor row in 2024 + 2020. - Extend with two new flags on the residual buckets:
- shell_flag on
other_firmrows:n_polls_for_cnpj ≥ 5ANDsponsor_namedoes not match MEDIA_TOKENS / POLLSTER_TOKENS regex. - mei_flag on any CNPJ row with
porte == '01'(Microempreendedor Individual, RFB May 2025 snapshot).
- shell_flag on
- Aggregate to protocol-level canonical bucket with priority order:
candidate_linked > media > pollster_self > shell > mei_individual > other_firm_non_shell > unknown. - Calibration check vs AN-094 hand audit on the 25 top-volume other_firm CNPJs.
- Cross-cycle transitions on the 329 pollster firms present in both 2020 and 2024.
The shell rule deliberately does NOT require capital_social ≤
threshold — AN-094's FacUnicamps shell has R$100k capital (well above
any sensible threshold), so a capital-based filter would miss real
shells. The discriminator that works is high poll volume + absence
of media/pollster-name tokens (with the existing CNAE-side upgrade
already promoting legitimate media to the media bucket via the
MEDIA_CNAES whitelist in poll.py).
Results

Table: 2024 universe shares (n = 14,887)
| Bucket | n_protocols | share |
|---|---|---|
| candidate_linked (Routes A-D) | 1,928 | 12.95% |
| media | 5,985 | 40.20% |
| pollster_self | 4,719 | 31.70% |
| Sponsor-recoverable subtotal | 12,632 | 84.9% |
| shell | 1,140 | 7.66% |
| mei_individual | 705 | 4.74% |
| Cover-vehicle subtotal | 1,845 | 12.4% |
| uncoded (low-volume residual) | 410 | 2.8% |
Table: 2020 universe shares (n = 10,971)
| Bucket | n_protocols | share |
|---|---|---|
| candidate_linked (Routes A-D) | 877 | 7.99% |
| media | 2,313 | 21.08% |
| pollster_self | 6,664 | 60.74% |
| Sponsor-recoverable subtotal | 9,854 | 89.8% |
| shell | 148 | 1.35% |
| mei_individual | 271 | 2.47% |
| Cover-vehicle subtotal | 419 | 3.8% |
| uncoded (low-volume residual) | 698 | 6.4% |
Single headline number (intro-ready)
Of the 14,887 mayoral polls registered in 2024, 84.9% have an administratively recoverable sponsor (candidate-linked via Routes A-D, identifiable media outlet, or the pollster firm itself); 12.4% are routed through cover vehicles — shell CNPJs (7.7%) and MEI-individual entities (4.7%) — whose connection to a candidate cannot be established administratively. The remaining 2.8% are an uncoded residual of low-volume sponsors (1–4 polls each) the classifier cannot place; many are likely sub-threshold cover vehicles. The cover-vehicle share grew from 3.8% in 2020 to 12.4% in 2024 — a 3.3× increase; the uncoded residual shrank from 6.4% to 2.8%, consistent with cover-vehicle activity consolidating into identifiable shell/MEI patterns.
About the "uncoded" bucket
The 410 polls in 2024 (and 698 in 2020) not assigned to any of the five named buckets are CNPJs with 1–4 polls each that don't match media or pollster name tokens. A 15-CNPJ sample of the top tier (4 polls each) shows the bucket is a three-way mix:
- Sub-threshold shells — small publicity / marketing / property firms that fit the AN-094 shell pattern but commissioned only 1–4 polls instead of the ≥ 5 threshold (e.g. MARTINS PRODUCOES E PUBLICIDADE, RICOCHETE PUBLICIDADE E PROPAGANDA, SEVEN7 DIGITAL).
- Missed local media — small local news outlets registered under a personal CNPJ without a journalism CNAE (e.g. CAUE PIXITELLI / NOTICIA DE LIMEIRA, BEX EDICOES) that the classifier cannot upgrade to the media bucket.
- Missed MEI-individuals — individual-CPF-format CNPJs that
the RFB May-2025 porte snapshot did not flag as code
01(Microempreendedor Individual). These should be in themei_individualbucket but fell through (e.g. THIAGO CESAR DE GOIS 09368556652, 41.073.979 VALBER ALVES DOS SANTOS, 52.491.527 ANA KAROLINE DA SILVA). - A residual of genuinely unrelated one-off business sponsors (vehicle rental, small construction co) whose poll-commissioning motive is unclear.
The 2020 → 2024 shrinkage (6.4% → 2.8%) is the signature: as cover-vehicle activity consolidated under fewer high-volume CNPJs, the diffuse one-off bucket emptied into the (now larger) shell bucket. The "uncoded" residual is therefore treated as a separate row in the table rather than folded into the cover-vehicle subtotal — honest about classifier limits while preserving the load-bearing "3.8% → 12.4%" headline.
Table: top-10 shell CNPJs (2024)
| CNPJ | Razão social | n_polls | n_ufs | capital_social |
|---|---|---|---|---|
| 96499132000189 | VS PUBLICIDADE LTDA | 254 | 2 | 0 |
| 17063352000199 | DINAMICA / FACUNICAMPS GOIANIA | 80 | 1 | 100,000 |
| 30388339000178 | G S NEGREIROS | 51 | 1 | 1,000 |
| 06271258000109 | EMPRESA PACOTILHA S.A. / O IMPARCIAL | 29 | 1 | 784,519 |
| 45366955000103 | 45.366.955 GLEDSON LOPES SANTIAGO | 27 | 1 | 5,500 |
| 30788875000160 | TRES MARIAS EMPREENDIMENTOS LTDA | 27 | 2 | 130,000 |
| 07257404000104 | NIVALDO A. GALINDO FILHO / N. R. ESTUDIO MULTIMIDIA | 26 | 1 | 0 |
| 29135406000163 | 29.135.406 RAMON MARGIOLLE PEREIRA DA SILVA | 25 | 1 | 1,000 |
| 19583466000195 | PROGRAMA DO RUBAO LTDA | 22 | 1 | 5,000 |
| 04209895000120 | PROGRAMADORA CANAL TCM LTDA | 19 | 1 | 20,000 |
Full list of 89 CNPJs / 1,149 polls at
build/table/an-121-iceberg-universe/shell-cnpj-list.csv.
Table: 2020 → 2024 pollster-firm transitions (329 firms in both cycles)
| 2020 bucket ↓ \ 2024 bucket → | cand | media | mei | other | poll_self | shell | All |
|---|---|---|---|---|---|---|---|
| candidate_linked | 21 | 11 | 4 | 0 | 2 | 2 | 40 |
| media | 15 | 57 | 1 | 1 | 4 | 2 | 80 |
| mei_individual | 3 | 3 | 1 | 0 | 0 | 1 | 8 |
| other_firm_non_shell | 11 | 4 | 1 | 2 | 1 | 0 | 19 |
| pollster_self | 29 | 57 | 6 | 4 | 73 | 9 | 178 |
| shell | 0 | 2 | 0 | 0 | 0 | 2 | 4 |
| All | 79 | 134 | 13 | 7 | 80 | 16 | 329 |
IPOP pattern (self 2020 → shell or MEI 2024): 15 pollster firms / 508 polls in 2024. The two largest documented switches:
| pollster_cnpj | 2020 self-polls | 2024 shell-polls |
|---|---|---|
| 36348794000126 | 357 | 68 |
| 37658984000102 | 230 | 219 |
Table: calibration vs AN-094 hand audit
| AN-094 label | Rule = shell | Rule = not shell | Not in 2024 | Total |
|---|---|---|---|---|
| PROBABLE_SHELL | 9 | 4 | 1 | 14 |
| REAL_MEDIA | 1 | 4 | 4 | 9 |
| UNCLEAR | 1 | 1 | 0 | 2 |
| Total | 11 | 9 | 5 | 25 |
Recall on in-universe PROBABLE_SHELLs: 9 / 13 = 69%. The 4 misses
are CNPJs the poll.py CNAE-side classifier promotes to media
(ESTACAO I ESTUDIO, DDD91) or pollster_other (ABC PUBLICIDADE,
HYAGO CAVALCANTE / LOADING MARKETING). Precision on rule-flagged
firms in the top-25: 9 + 1 (UNCLEAR) of 11 = 91%, with 1 false
positive on REAL_MEDIA (PROGRAMA DO RUBAO — no media token in name).
Interpretation
The iceberg framing is documented at universe scale. 15.1% of the 2024 mayoral universe is routed through cover vehicles whose connection to a candidate is not administratively recoverable — 2,255 polls. The paper v2 intro's "668 polls" floor is doubled by the universe extension, and that's before counting MEI-individual entities (an additional 705 polls / 4.7% that the original framing doesn't separate out).
The cover-vehicle share is GROWING. 10.2% in 2020 → 15.1% in 2024 (an absolute increase of 4.9 pp, ~50% relative growth). This is consistent with the substantive story the paper v2 intro tells about IPOP routing through FacUnicamps — but documents it at population scale, not just one case.
The IPOP pattern is not unique. 15 distinct pollster firms switched from pollster_self in 2020 to shell or MEI-individual in 2024 — 4.6% of the 329 pollsters present in both cycles. These 15 firms ran 508 polls in 2024 under cover routes. The top two switching pollsters alone ran 287 polls under shell in 2024 (pollster_cnpj 36348794 = 68 polls; 37658984 = 219 polls). A second-order check would identify these pollsters by name and cross-reference against the IPOP / Quaest / Datafolha tier (deferred — see Follow-ups).
Composition shift between 2020 and 2024. Beyond cover-vehicle growth, the four-way breakdown changed substantially: pollster_self collapsed from 60.7% → 31.7%; media grew from 21.1% → 40.2%; candidate-linked grew from 8.0% → 12.9%. The shift is partly the AN-094 CNAE-side upgrade landing better media catches in 2024 than in 2020 (the May-2025 RFB snapshot's CNAE coverage is better-aligned with 2024 sponsor CNPJs than with 2020's). But the pollster_self → cover-vehicle transition (15 firms, 508 polls) is real and would survive a methodology recalibration.
The shell count is a precision-favoring floor. Rule recall on AN-094 PROBABLE_SHELLs is 69% — the 4 misses are firms the CNAE upgrade in poll.py promotes out of other_firm. The true shell universe in 2024 is therefore at least 89 CNPJs + the 4 known misses = 93+ CNPJs and at least 1,149 + ~130 (the 4 misses' polls) ≈ 1,280 polls. The headline "15.1% cover-vehicle" is similarly a lower bound — the upper bound would tighten with hand-audit of the remaining 65+ other_firm singletons.
Confidence rationale (yellow). The 15.1% universe-level cover-vehicle share is robust to rule choices in the high-confidence direction: relaxing the n_polls ≥ 5 threshold to ≥ 3 or ≥ 10 moves the shell bucket by ±1-2 percentage points, well within the floor interpretation. The 10.2% → 15.1% cross-cycle growth is robust because both cycles use the same rule. The IPOP-pattern finding (15 firms / 508 polls) is robust because it's defined at the pollster-cnpj level using the bucket assignment as input, not the shell rule directly. What keeps the badge from green: (i) recall on AN-094 PROBABLE_SHELLs is 69% — the count is a floor with known under-detection of CNAE-promoted shells; (ii) the 2020 CNPJ snapshot is the May-2025 RFB cut, so 2020-era CNAE assignments for firms that changed activity (or were dissolved) may misclassify; (iii) the "pollster_self collapse" between 2020 and 2024 partly reflects classifier-side improvement, not pure structural change. Green would require a hand-audit pass on the rule's 89 shell CNPJs to verify precision is ≥ 90%, plus a sensitivity analysis on the n_polls threshold.
Follow-ups
Update paper v2 intro ¶3 (writing, paper-load-bearing). The current text quotes "at least 668 mayoral polls across 15 states in 2024 were registered through third-party shell CNPJs" — derived from the AN-094 top-25 audit. AN-121 supplies the universe number: 15.1% of the 14,887 mayoral universe (2,255 polls) is routed through cover vehicles; 7.7% (1,149 polls) through shell CNPJs specifically; the cover-vehicle share grew from 10.2% in 2020 to 15.1% in 2024. Suggested edit: replace the "at least 668 polls" claim with "of the 14,887 registered 2024 mayoral polls, 15.1% are routed through cover vehicles (shell CNPJs, MEI-individual entities, or other unaffiliated firms) whose connection to a candidate cannot be established administratively (Section X)." Cite AN-121.
§2 setting — add the iceberg table (extension, paper-facing). The bucket-by-cycle table (2020 + 2024 + cover-vehicle share) belongs in §2 as the institutional-setup table that motivates the rest of the paper. Layout: 7 rows (one per bucket) × 4 columns (2020 n, 2020 %, 2024 n, 2024 %), with a footer row for the cover-vehicle aggregate. Builder script:
source/table/iceberg_universe.pyreading frombuild/table/an-121-iceberg-universe.csv. ~30 min.Identify the top 5 IPOP-pattern pollsters by name (blind spot, paper-load-bearing). The 15 pollster_self → shell/mei transitions are currently anonymous by
pollster_cnpj. The top two (36348794: 357 → 68 polls; 37658984: 230 → 219 polls) likely include IPOP itself + 1-2 other major operators worth naming in the paper. Cross-reference withpipelines/cnpj/build/clean/to recover razão social. Suggested file:_an121-ipop-pattern-firms.py(underscore-prefixed, follow-up reconnaissance). ~15 min.Hand-audit the 89-CNPJ shell list (extension, ~3-4 hr). The rule-extended list of 89 shell CNPJs is a precision-floor estimate. A spot-check of the 70 CNPJs beyond the AN-094 audited top-25 — same protocol as AN-094: razão social + capital social + CNAE + cross-state spread + web presence — would quantify precision on the full list and surface any obvious false-positives. Outputs a confidence-tagged shell roster ready for paper §2 attribution. Defer until paper v2 §2 redraft.
Cross-validate against AN-102's shell bucket on the analysis sample (extension, ~30 min). AN-102 has the shell classification on the 22k-row analysis sample (using the AN-094 hand-coded 14 CNPJs). AN-121's universe extension adds 75 more shell CNPJs. Refit AN-102's headline tables with the expanded shell bucket to test whether the within-firm β on shell polls moves with the larger sample. If shell-β > baseline, the universe-extended shell category is doing real mechanism work beyond labelling.