id: an-072v2 hypothesis: perceived-bias-validation headline: Poll-level test inverts the candidate-level direction. Within race × week, polls sponsored by a candidate are −3.9 pp less likely to be the subject of a fraud-flavored PESQUISA-eleitoral case (p=0.010, base rate 4.3 %). The perceived-bias prediction (sponsored polls draw more legal challenge) fails on its own unit of analysis. type: descriptive status: interpreted status_date: 2026-06-16 confidence: yellow created: 2026-06-16 script: source/analysis/an-072v2-poll-level-fraud-suit.py target: build/table/an-072v2-poll-level-fraud-suit.csv cited_in: [] design: sample: 8,943 unique 2024 protocols in build/assemble/cand_poll.parquet; 7,274 unique race-week cells. specification: poll-level OLS of sued_fraud ∈ {0,1} on poll_has_candidate_sponsor. Spec ladder: A race FE (muni_id), B race × week FE (muni_id × field_period_week), C UF FE only, D race × week + log(sample_size). Cluster-robust SE on the FE group. sued_fraud(p) = 1 if protocol p is cited by ≥1 2024 PESQUISA case in the fraud assunto bucket. notes: Case→protocol linkage from source/intermediate/case_protocols_2024.py — regex over TREdiarios mov.text (PROTO_DISPLAY UF-NNNNN/YYYY + PROTO_COMPACT UFNNNNNYYYY), intersected with the cleaned poll registry. Coverage ≈ 9.1 % of fraud cases yield ≥1 registry hit. Unmatched cases are coded sued=0, which biases the estimate toward zero — significant negative is conservative.
AN-072v2: Poll-level fraud-suit rate by candidate sponsorship
Question
Use 1 v2 — the unit shift. AN-072v1 found a positive candidate-level effect (+3.15 pp on fraud-suit involvement within race FE), which is the perceived-bias prediction at the candidate side. But the actual hypothesis is about polls: does a poll sponsored by a candidate attract more legal challenge than a peer poll of the same race × week without a candidate sponsor?
Design
Per protocol p (n = 8,943 unique 2024 polls):
- Outcome
sued_fraud(p)= 1 if any 2024 PESQUISA case with assunto FRAUDULENTA or IRREGULARIDADES PUBLICADOS cites p via regex extraction of mov.text in TREdiarios. - Treatment
poll_has_candidate_sponsor(p)— fixed property of the protocol. - Spec ladder:
- A. muni_id FE
- B. muni_id × field_period_week FE
- C. UF FE only
- D. race × week FE + log(sample_size)
Cluster-robust SE on the FE group. Coverage caveat: case→protocol regex hit rate is ~9 % of fraud cases (bounded by mov coverage of ~26 % and within-text regex recall of ~35 %). Unmatched cases coded sued=0, biasing toward zero.
Findings
| Spec | y | coef | SE | t | p | n | clusters |
|---|---|---|---|---|---|---|---|
| A | sued_fraud | −0.019 | 0.011 | −1.80 | 0.072 | 8,943 | 2,948 |
| B | sued_fraud | −0.039 | 0.015 | −2.58 | 0.010 | 8,943 | 7,274 |
| C | sued_fraud | −0.007 | 0.007 | −0.98 | 0.329 | 8,943 | 26 |
| D | sued_fraud | −0.039 | 0.015 | −2.58 | 0.010 | 8,943 | 7,274 |
| B | sued_any | −0.007 | 0.023 | −0.32 | 0.746 | 8,943 | 7,274 |
Tighter FE → larger negative coefficient. The race × week FE is doing real work: within the same race and the same fielding week, candidate-sponsored polls have a sued-fraud rate of ~0.4 pp vs the within-cell baseline ~4.3 pp. Marginal rates (no FE): sponsored 3.4 % vs unsponsored 4.5 %.
sued_any (all PESQUISA bucket) is small and not significant — the
effect is fraud-specific, not driven by registration compliance.
Interpretation
The perceived-bias prediction fails at the unit it was intended for. Sponsored polls are not over-represented as litigation targets; they are under-represented, conditional on race × week.
Combined with AN-072v1 (positive candidate-level coefficient), the two findings are coherent: candidates with self-sponsored polls operate in lawsuit-heavy races (selection), but the lawsuits in those races target the other polls in the field — the independent or media-aligned ones — not the candidate's own. Plausible mechanisms:
- Candidate-paid polls undergo more legal vetting (registration, defensible methodology) precisely because they are visible.
- News-outlet polls aligned with one side draw fraud filings from the opposing campaign more often than direct candidate sponsorship does.
- The fraud assunto includes "DIVULGACAO FRAUDULENTA" — a publication-side claim that fits media-published polls better than candidate-commissioned ones.
This inverts the framing of the use-1 prediction in
docs/summary.md. The sued-rate test no longer supports
"sponsored polls are perceived-as-biased". It supports the opposite:
sponsored polls survive more challenges than peer independents.
Caveats
- Coverage. 91 % of fraud cases yield no protocol via regex (26 % no mov text; of those, 35 % regex recall). False negatives in the sued indicator push the estimate toward zero — magnitude may be larger than reported.
- Registry-only universe. Cases against unregistered polls fall in the COMPLIANCE bucket, not FRAUD. They are correctly excluded here, but the fraud bucket is itself filtered to polls with valid registry entries.
- Regex precision. Spot check on SP showed ~59 % of extracted protocols match registry; non-matches are likely typos or unregistered protocols. The intersection-with-registry filter removes those from the sued set.
- Bucket assignment is per case, not per protocol. A protocol cited by both fraud and compliance cases gets sued_fraud=1 and sued_compl=1 separately.
Files
- intermediate (case→protocol map):
source/intermediate/case_protocols_2024.py→build/intermediate/case_protocols_2024.tsv - script:
source/analysis/an-072v2-poll-level-fraud-suit.py - table:
build/table/an-072v2-poll-level-fraud-suit.csv - headline JSON:
build/table/an-072v2-poll-level-fraud-suit.json