AN-081: Audit-request outcomes + follow-on linkage

2024 LE.34 §1 audit-request universe = 1,175 cases. Of audits that judges *granted* (n=64 with plaintiff data), 51.6 % had the same plaintiff file a fraud-flavored PESQUISA case in the same UF within 90 days, vs 25 % for denied audits (Δ 26.6 pp, z ≈ 2.3). Audits are not coming up empty — when access is granted, follow-on litigation more than doubles. Tilts the AN-080 mechanism-(1)-vs-(2) question toward (1) operational deviation from declared plano amostral.

Hypothesis: bias-induced-at-unsupervised-margin
Confidence: yellow
Type: descriptive

Design

Sample: 1,175 audit cases in proc_2024.parquet (assunto containing ACESSO SISTEMA INTERNO or NAOACESSO or OBTENCAO DE ACESSO). 272 (23.1 %) have mov.text in TREdiarios 2024/2025. 250 (21.3 %) have plaintiff parte data. Same DJEN/eproc gap pattern as the fraud-suit work — most 1st-instance zona decisions absent from TRE diários.
Specification: outcome classifier via regex over decision keywords (DEFIRO / INDEFIRO / EXTINTO / HOMOLOGO DESISTÊNCIA / PERDA DE OBJETO). Follow-on linkage by matching plaintiff name (normalized upper-case) across audit case and subsequent PESQUISA case in same UF, within (D, D + 90 days].
Notes: Coverage censoring is severe (~23 % both for decisions and plaintiff data). Report counts not rates as population estimates; the conditional pattern (granted vs denied → follow-on) is what's identified, not the marginal rates.

Script: source/analysis/an-081-audit-outcome-distribution.py
Target: build/table/an-081-audit-outcomes.csv, build/table/an-081-audit-followon.csv
Status: interpreted · 2026-06-16
Created: 2026-06-16

Question

AN-080's null on within-firm × statistician slant variation admits two observationally equivalent mechanisms:

(1) Bias induced at an operational margin outside the declared plano amostral (substrata over-quotaed, post-strat weights tilted, etc.) — the statistician signs the declared registration but does not supervise the fielding.
(2) Channel B fabrication after data collection — the statistician's sign-off is genuine; the published numbers are edited post hoc.

These differ in what a LE.34 §1 audit would find. Under (1), the planilhas individuais reveal the fielding deviated from the declared design; the requestor then has grounds for a follow-on §34.§3 / §33.§4 case. Under (2), the planilhas back the published numbers and the audit comes up empty.

This script uses the existing DataJud + TREdiarios pipeline to measure what we can observe at the audit-disposition level: (a) how often audits were filed, (b) what outcome the judge wrote in the publicly available decision, and (c) whether the requestor came back with a follow-on PESQUISA case in the same UF within 90 days.

Universe

The 2024 DataJud assunto taxonomy isolates the audit universe directly:

Assunto	n
REQUERIMENTO DE ACESSO AO SISTEMA INTERNO DE CONTROLE E DADOS DE PESQUISAS ELEITORAIS	1,129
NAOACESSO DOS PARTIDOS AOS DADOS RELATIVOS AS PESQUISAS ELEITORAIS	42
OBTENCAO DE ACESSO A SISTEMA DE DADOS ELEITORAIS	10
Total unique numbers	1,175

Top UFs: RN 165, PR 122, SP 118, MT 112, AM 84, MG 79, PI 64, SC 52, PB 48, GO 47. 98 % first-instance Petição Cível.

Outcomes (272 cases with mov text)

Outcome	n	Share
extinct (procedural close / perda de objeto / desistência)	98	36.0 %
ambiguous (decision text present, no clear keyword)	82	30.1 %
granted (DEFIRO / AUTORIZO / CONCEDO ACESSO)	70	25.7 %
denied (INDEFIRO / NEGO)	22	8.1 %

Of the 272 cases with mov text, 29 % (79) had a decision text containing "irregularidade" / "divergência" / "discrepância" / "inconsistência" outside boilerplate. Read carefully — many such mentions are framing the legal standard rather than reporting a finding. The keyword incidence is a ceiling on the rate at which courts explicitly characterized fielding deviation.

Follow-on linkage (250 audit cases with plaintiff data)

For each audit case where TREdiarios has plaintiff parte information, count any PESQUISA case in the same UF where the same plaintiff name (normalized) appears as autor, filed in (audit_date, audit_date + 90 days].

	n	any fraud follow-on	any compl follow-on
All linked	250	105 (42.0 %)	87 (34.8 %)
Outcome = granted	64	33 (51.6 %)	21 (32.8 %)
Outcome = denied	20	5 (25.0 %)	8 (40.0 %)
Outcome = extinct	94	35 (37.2 %)	28 (29.8 %)
Outcome = ambiguous	72	32 (44.4 %)	30 (41.7 %)

The headline contrast. Audits that judges granted are followed by a fraud-flavored case from the same plaintiff in the same UF within 90 days 52 % of the time, versus 25 % for audits that were denied. Difference 26.6 pp; SE on the difference ≈ 11.5 pp; z ≈ 2.3. The contrast survives the coarse specification.

Audits that were denied still show a follow-on rate of 25 % for fraud and 40 % for compliance — i.e., the requestor often files the suit anyway, presumably with evidence from other channels. The granted → fraud excess is the marginal effect of gaining audit access: the audit produces something actionable beyond what the plaintiff already had.

Interpretation

The audit pattern is more consistent with mechanism (1) — that the LE.34 §1 audit, when conducted, reveals operational fielding deviation actionable under §34.§3 or §33.§4. Under mechanism (2), audits would generically come up empty (planilhas back the published numbers), and the granted-vs-denied gap in subsequent litigation should be small. We observe a 27 pp gap. The inference is suggestive rather than dispositive:

The granted ≫ denied gap could partly reflect selection on audit petition — judges grant when the plaintiff's threshold showing is stronger, and that threshold showing is itself a predictor of subsequent litigation regardless of audit findings. Without observing the audit content (Layer 2), we cannot separate "audit found something" from "plaintiff already had something".
Coverage is heavy (~23 % both for decisions and for plaintiff data). The reading rests on what we can observe; the unseen 77 % could re-weight either way.
The 90-day window catches the immediate post-audit window of the 2024 mayoral cycle but misses follow-on litigation in longer time horizons (e.g., 2025 §4 criminal cases).

What this doesn't settle: the actual content of audit findings. For that we'd need the planilhas individuais, the audit-report PDFs, and the §3 retificação records — Layer 2 (per-TRE PJe document scrape). See docs/todo.md for the queued item.

Caveats

23 % mov-coverage / 21 % parte-coverage. Same DJEN/eproc gap as the fraud-suit work. The unseen majority could behave differently from the observed minority.
Plaintiff matching by normalized name. No CPF/CNPJ for the parte in TREdiarios; we used uppercased + whitespace-collapsed string match. Variant spellings will under-link.
"Same UF, same plaintiff" excludes federal-level follow-on. Some §4 criminal cases get federated; we'd miss those.
Outcome classifier is regex-based, not LLM. A 30 % share of "ambiguous" reflects this. An LLM pass on the ambiguous subset could tighten the granted/denied counts but isn't blocking the headline contrast (granted >> denied is stable).

Files

script: source/analysis/an-081-audit-outcome-distribution.py
tables:
- build/table/an-081-audit-outcomes.csv (per-case outcome)
- build/table/an-081-audit-followon.csv (per-case follow-on counts)
- build/table/an-081-summary.json (headline aggregates)
thinking: docs/thinking/conre-statistician-lever.md § "What the null implies about where the bias is induced" — this analysis is the first empirical handle on mechanism (1) vs (2).