id: an-072v2 hypothesis: perceived-bias-validation headline: Poll-level test inverts the candidate-level direction. Within race × week, polls sponsored by a candidate are −3.9 pp less likely to be the subject of a fraud-flavored PESQUISA-eleitoral case (p=0.010, base rate 4.3 %). The perceived-bias prediction (sponsored polls draw more legal challenge) fails on its own unit of analysis. type: descriptive status: interpreted status_date: 2026-06-16 confidence: yellow created: 2026-06-16 script: source/analysis/an-072v2-poll-level-fraud-suit.py target: build/table/an-072v2-poll-level-fraud-suit.csv cited_in: [] design: sample: 8,943 unique 2024 protocols in build/assemble/cand_poll.parquet; 7,274 unique race-week cells. specification: poll-level OLS of sued_fraud ∈ {0,1} on poll_has_candidate_sponsor. Spec ladder: A race FE (muni_id), B race × week FE (muni_id × field_period_week), C UF FE only, D race × week + log(sample_size). Cluster-robust SE on the FE group. sued_fraud(p) = 1 if protocol p is cited by ≥1 2024 PESQUISA case in the fraud assunto bucket. notes: Case→protocol linkage from source/intermediate/case_protocols_2024.py — regex over TREdiarios mov.text (PROTO_DISPLAY UF-NNNNN/YYYY + PROTO_COMPACT UFNNNNNYYYY), intersected with the cleaned poll registry. Coverage ≈ 9.1 % of fraud cases yield ≥1 registry hit. Unmatched cases are coded sued=0, which biases the estimate toward zero — significant negative is conservative.

AN-072v2: Poll-level fraud-suit rate by candidate sponsorship

Question

Use 1 v2 — the unit shift. AN-072v1 found a positive candidate-level effect (+3.15 pp on fraud-suit involvement within race FE), which is the perceived-bias prediction at the candidate side. But the actual hypothesis is about polls: does a poll sponsored by a candidate attract more legal challenge than a peer poll of the same race × week without a candidate sponsor?

Design

Per protocol p (n = 8,943 unique 2024 polls):

Outcome sued_fraud(p) = 1 if any 2024 PESQUISA case with assunto FRAUDULENTA or IRREGULARIDADES PUBLICADOS cites p via regex extraction of mov.text in TREdiarios.
Treatment poll_has_candidate_sponsor(p) — fixed property of the protocol.
Spec ladder:
- A. muni_id FE
- B. muni_id × field_period_week FE
- C. UF FE only
- D. race × week FE + log(sample_size)

Cluster-robust SE on the FE group. Coverage caveat: case→protocol regex hit rate is ~9 % of fraud cases (bounded by mov coverage of ~26 % and within-text regex recall of ~35 %). Unmatched cases coded sued=0, biasing toward zero.

Findings

Spec	y	coef	SE	t	p	n	clusters
A	sued_fraud	−0.019	0.011	−1.80	0.072	8,943	2,948
B	sued_fraud	−0.039	0.015	−2.58	0.010	8,943	7,274
C	sued_fraud	−0.007	0.007	−0.98	0.329	8,943	26
D	sued_fraud	−0.039	0.015	−2.58	0.010	8,943	7,274
B	sued_any	−0.007	0.023	−0.32	0.746	8,943	7,274

Tighter FE → larger negative coefficient. The race × week FE is doing real work: within the same race and the same fielding week, candidate-sponsored polls have a sued-fraud rate of ~0.4 pp vs the within-cell baseline ~4.3 pp. Marginal rates (no FE): sponsored 3.4 % vs unsponsored 4.5 %.

sued_any (all PESQUISA bucket) is small and not significant — the effect is fraud-specific, not driven by registration compliance.

Interpretation

The perceived-bias prediction fails at the unit it was intended for. Sponsored polls are not over-represented as litigation targets; they are under-represented, conditional on race × week.

Combined with AN-072v1 (positive candidate-level coefficient), the two findings are coherent: candidates with self-sponsored polls operate in lawsuit-heavy races (selection), but the lawsuits in those races target the other polls in the field — the independent or media-aligned ones — not the candidate's own. Plausible mechanisms:

Candidate-paid polls undergo more legal vetting (registration, defensible methodology) precisely because they are visible.
News-outlet polls aligned with one side draw fraud filings from the opposing campaign more often than direct candidate sponsorship does.
The fraud assunto includes "DIVULGACAO FRAUDULENTA" — a publication-side claim that fits media-published polls better than candidate-commissioned ones.

This inverts the framing of the use-1 prediction in docs/summary.md. The sued-rate test no longer supports "sponsored polls are perceived-as-biased". It supports the opposite: sponsored polls survive more challenges than peer independents.

Caveats

Coverage. 91 % of fraud cases yield no protocol via regex (26 % no mov text; of those, 35 % regex recall). False negatives in the sued indicator push the estimate toward zero — magnitude may be larger than reported.
Registry-only universe. Cases against unregistered polls fall in the COMPLIANCE bucket, not FRAUD. They are correctly excluded here, but the fraud bucket is itself filtered to polls with valid registry entries.
Regex precision. Spot check on SP showed ~59 % of extracted protocols match registry; non-matches are likely typos or unregistered protocols. The intersection-with-registry filter removes those from the sued set.
Bucket assignment is per case, not per protocol. A protocol cited by both fraud and compliance cases gets sued_fraud=1 and sued_compl=1 separately.

Files

intermediate (case→protocol map): source/intermediate/case_protocols_2024.py → build/intermediate/case_protocols_2024.tsv
script: source/analysis/an-072v2-poll-level-fraud-suit.py
table: build/table/an-072v2-poll-level-fraud-suit.csv
headline JSON: build/table/an-072v2-poll-level-fraud-suit.json