id: an-072v2 hypothesis: perceived-bias-validation headline: Poll-level test inverts the candidate-level direction. Within race × week, polls sponsored by a candidate are −3.9 pp less likely to be the subject of a fraud-flavored PESQUISA-eleitoral case (p=0.010, base rate 4.3 %). The perceived-bias prediction (sponsored polls draw more legal challenge) fails on its own unit of analysis. type: descriptive status: interpreted status_date: 2026-06-16 confidence: yellow created: 2026-06-16 script: source/analysis/an-072v2-poll-level-fraud-suit.py target: build/table/an-072v2-poll-level-fraud-suit.csv cited_in: [] design: sample: 8,943 unique 2024 protocols in build/assemble/cand_poll.parquet; 7,274 unique race-week cells. specification: poll-level OLS of sued_fraud ∈ {0,1} on poll_has_candidate_sponsor. Spec ladder: A race FE (muni_id), B race × week FE (muni_id × field_period_week), C UF FE only, D race × week + log(sample_size). Cluster-robust SE on the FE group. sued_fraud(p) = 1 if protocol p is cited by ≥1 2024 PESQUISA case in the fraud assunto bucket. notes: Case→protocol linkage from source/intermediate/case_protocols_2024.py — regex over TREdiarios mov.text (PROTO_DISPLAY UF-NNNNN/YYYY + PROTO_COMPACT UFNNNNNYYYY), intersected with the cleaned poll registry. Coverage ≈ 9.1 % of fraud cases yield ≥1 registry hit. Unmatched cases are coded sued=0, which biases the estimate toward zero — significant negative is conservative.

AN-072v2: Poll-level fraud-suit rate by candidate sponsorship

Question

Use 1 v2 — the unit shift. AN-072v1 found a positive candidate-level effect (+3.15 pp on fraud-suit involvement within race FE), which is the perceived-bias prediction at the candidate side. But the actual hypothesis is about polls: does a poll sponsored by a candidate attract more legal challenge than a peer poll of the same race × week without a candidate sponsor?

Design

Per protocol p (n = 8,943 unique 2024 polls):

Cluster-robust SE on the FE group. Coverage caveat: case→protocol regex hit rate is ~9 % of fraud cases (bounded by mov coverage of ~26 % and within-text regex recall of ~35 %). Unmatched cases coded sued=0, biasing toward zero.

Findings

Spec y coef SE t p n clusters
A sued_fraud −0.019 0.011 −1.80 0.072 8,943 2,948
B sued_fraud −0.039 0.015 −2.58 0.010 8,943 7,274
C sued_fraud −0.007 0.007 −0.98 0.329 8,943 26
D sued_fraud −0.039 0.015 −2.58 0.010 8,943 7,274
B sued_any −0.007 0.023 −0.32 0.746 8,943 7,274

Tighter FE → larger negative coefficient. The race × week FE is doing real work: within the same race and the same fielding week, candidate-sponsored polls have a sued-fraud rate of ~0.4 pp vs the within-cell baseline ~4.3 pp. Marginal rates (no FE): sponsored 3.4 % vs unsponsored 4.5 %.

sued_any (all PESQUISA bucket) is small and not significant — the effect is fraud-specific, not driven by registration compliance.

Interpretation

The perceived-bias prediction fails at the unit it was intended for. Sponsored polls are not over-represented as litigation targets; they are under-represented, conditional on race × week.

Combined with AN-072v1 (positive candidate-level coefficient), the two findings are coherent: candidates with self-sponsored polls operate in lawsuit-heavy races (selection), but the lawsuits in those races target the other polls in the field — the independent or media-aligned ones — not the candidate's own. Plausible mechanisms:

This inverts the framing of the use-1 prediction in docs/summary.md. The sued-rate test no longer supports "sponsored polls are perceived-as-biased". It supports the opposite: sponsored polls survive more challenges than peer independents.

Caveats

Files