AN-072: Candidate-level fraud-suit rate by sponsor exposure (2024)

Within race, 2024 mayoral candidates with ≥1 self-sponsored poll are 3.15 pp more likely to appear as litigant in a fraud-flavored PESQUISA-eleitoral case (race-FE OLS, p=0.011, base rate 7.9 %). Direction matches the perceived-bias prediction; magnitude is small in absolute terms but ~40 % relative. Naive (no-race-FE) comparison runs the other way because self-sponsoring candidates cluster in races with lower overall sue rates.

Hypothesis: perceived-bias-validation
Confidence: yellow
Type: descriptive

Design

Sample: 6,768 PREFEITO-2024 candidates with ≥1 poll in poll_2024 registry and a CPF in politico table; 2,871 unique races (muni_id).
Specification: candidate-level OLS of P(any fraud-flavored PESQUISA case involvement) on any_self ∈ {0,1}. Race FE via within-muni demeaning. Cluster-robust SE on muni_id. Controls in {[],[log(n_polls)],[log(n_polls), log1p(n_indep)]}. Continuous SponsoredBy_c = n_self / n_polls as alt treatment.
Notes: Fraud bucket via assunto_desc ∈ {DIVULGACAO DE PESQUISA ELEITORAL FRAUDULENTA, PESQUISA FRAUDULENTA, DIVULGACAO DE PESQUISA DE FRAUDULENTA, IRREGULARIDADES DOS DADOS PUBLICADOS EM PESQUISAS ELEITORAIS}. No LLM in v1 — the 2024 DataJud taxonomy is granular enough to do the cut from assunto alone. cand_proc_2024 matches CPF to TREdiarios parte; only ~12 % of PESQUISA cases match any candidate (most defendants are pollster CNPJs), so this is a candidate-side, not poll-side, test.

Script: source/analysis/an-072-fraud-suit-by-sponsor.py
Target: build/table/an-072-fraud-suit-by-sponsor.csv
Status: interpreted · 2026-06-16
Created: 2026-06-16

Question

Use 1 of the EJ poll-lawsuits agenda (docs/todo.md § Complementary data). Do candidates with self-sponsored polls draw legal challenges framed as methodological fraud? The 50-case 2020 pilot showed the sued universe is overwhelmingly formal compliance, so the revised test restricts to fraud-flavored assuntos. If the perceived-bias prediction holds, candidates whose polls include self-sponsored ones should be over-represented as litigants in fraud cases.

Design

Universe: 6,768 PREFEITO-2024 candidates appearing in build/assemble/cand_poll.parquet. Treatment is any_self = 1{n_self ≥ 1} (5.5 % of the sample) or continuous SponsoredBy_c = n_self / n_polls. Outcomes are candidate-level indicators built by joining cand_proc_2024.csv (CPF → case role) to proc_2024.parquet (assunto_desc → fraud / compliance bucket):

fraud_any — appears as autor OR reu in any fraud-flavored case (7.9 %)
fraud_autor / fraud_reu — split by role
pesq_any — any PESQUISA case regardless of flavor (14.9 %)
compl_any — compliance-flavored cases (registration violations)

Race FE absorbed by within-muni_id demeaning. Cluster-robust SE on muni_id. Spec ladder A–D varies controls and treatment scale.

Findings

Naive (no FE) comparison: treated rate 5.6 % vs. control 8.0 %, −2.3 pp — suggests self-sponsoring cands are less sued. This is selection on race: muni_id concentrates self-sponsoring cands in disputed races where fraud-suits are actually less common (e.g., mid-tier cities where pollsters register more polls and challenges are dilute).

Within race, the sign flips and matches the prediction:

Spec	y	coef (any_self)	SE	t	p
A	fraud_any	0.031	0.012	2.54	0.011
B	fraud_any	0.030	0.012	2.43	0.015
C	fraud_any	0.031	0.012	2.49	0.013
A	fraud_autor	0.016	0.012	1.33	0.183
A	fraud_reu	0.025	0.013	1.91	0.056
A	pesq_any	0.025	0.014	1.76	0.079
A	compl_any	0.020	0.013	1.59	0.112

Continuous SponsoredBy_c is small and not significant (coef 0.009, p = 0.40) — the action is at the extensive margin (any vs none), not the intensive margin (share).

The role split has reu (defendant) slightly stronger than autor (plaintiff): being sued for fraud is what sponsorship predicts, more than suing about fraud.

Interpretation

Direction of headline matches Use-1 prediction: within race, sponsored exposure predicts fraud-suit involvement. Magnitude is 3.1 pp on a 7.9 % base, ≈ 40 % relative. The compliance bucket shows a similar but smaller and non-significant pattern, so the fraud-specific cut does the work — consistent with the framing that the methodologically- suspicious sub-universe is what self-sponsorship draws.

Two caveats keep this at yellow confidence:

Candidate-side, not poll-side, test. Only ~12 % of PESQUISA cases match any candidate via cand_proc_2024 (most defendants are pollster CNPJs). The 88 % unmatched cases are not necessarily about candidates whose CPF didn't fuzzy-match; they're about polls whose challengers and challengees are firms.
Treatment is candidate exposure, not poll-level perception. A poll-level test would put SponsoredBy_c on the poll and ask whether that poll is named in a case. Doable only on the ~36 % of 2024 PESQUISA cases that have mov text in TREdiarios; needs an LLM/regex pass to extract poll-protocol references from the decision text. Done: AN-072v2 (2026-06-16). v2 finds the opposite sign at the poll unit — within race × week, candidate- sponsored polls are −3.9 pp less likely to be sued for fraud (p=0.010). The two findings are reconciled by selection: cands with self-sponsored polls operate in lawsuit-heavy races, but the lawsuits target other polls, not theirs. See docs/analyses/an-072v2-poll-level-fraud-suit.md.

Files

script: source/analysis/an-072-fraud-suit-by-sponsor.py
table: build/table/an-072-fraud-suit-by-sponsor.csv
headline JSON: build/table/an-072-fraud-suit-by-sponsor.json