AN-080: Slant unit — firm vs. statistician decomposition

Firm is the slant-control unit, not the statistician. Within multi-statistician firms, adding statistician FE on top of firm FE raises R² by only 0.016 (0.174→0.190) and the interaction F(16, 1249)=1.19, p=0.27 is null. Within-firm sponsor-routing is also null (permutation p=0.73 on 19 multi-stat firms). Consistent with bias being either uniformly accepted by all signers of a firm, or induced at a margin the statistician does not supervise — the test cannot discriminate.

Hypothesis: statistician-as-slant-unit
Confidence: yellow
Type: descriptive

Design

Sample: cand_poll.parquet matched_share=1 universe joined to the cleaned statistician map (build/intermediate/statistician_map_2024.tsv). 21,425 (protocol × candidate) rows, 282 statisticians, 421 firms; 448 self-sponsored rows. FE-ladder subframe restricts to firms with ≥3 sponsored, ≥3 unsponsored, ≥2 statisticians — 14 firms, 24 statisticians, n=1,295.
Specification: spec ladder A (no FE) / B (firm FE) / C (statistician FE) / D (firm + statistician FE) / E (D + stat × sponsored interaction). Cluster-robust SE not applied to keep the F-test apples-to-apples with the nested OLS. Per-statistician β = mean(bias | sponsored) − mean(bias | unsponsored) on the headline universe, restricted to signers with ≥5 sp & ≥5 un.
Notes: Bias = poll_percent_raw − 100·final_share. Per-statistician β is unconditional and confounds with firm-mix. The FE ladder is the identified test.

Script: source/analysis/an-080-slant-unit-firm-vs-statistician.py
Target: build/table/an-080-slant-unit-firm-vs-statistician.csv
Status: interpreted · 2026-06-16
Created: 2026-06-16

Question

The CONRE statistician thread in docs/thinking/conre-statistician-lever.md proposed that personally-on-record signatories (NM_ESTATISTICO_RESP + CD_CONRE) could be a sleeping legal lever for policing bias. The empirical version of that question: is the statistician a separate slant-control unit from the firm? Two complementary tests.

Q1. Within firms, do sponsored polls cluster on specific signers?

Permutation test on within-firm sponsor-rate spread across statisticians, on 19 multi-stat firms with ≥2 sponsored, ≥2 unsponsored polls. Per-firm chi-squared on (statistician × sponsored) crosstabs as a secondary descriptive.

Q2. Within firms, does β vary by statistician?

Four-spec FE ladder + F-test of statistician × sponsored interaction over firm + stat FE base. Subframe restricted to firms with ≥3 sp, ≥3 un, ≥2 stats (14 firms, 24 statisticians, n=1,295).

Findings

Per-statistician β (unconditional, 23 signers)

Statistic	Value
Mean	5.37
Std	5.06
Range	−4.43 to +13.46
Share β > 0	82.6 %
Share β > 5	60.9 %
Share β > 10	17.4 %

Top by β (with n_sp ≥ 5 & n_un ≥ 5):

CONRE	Name	n_sp	n_un	n_firms	β
11248	ANGELA MARIA DA SILVA	5	74	1	+13.46
9019	UBIRAJARA ALVES TRINDADE SAMPAIO	5	10	1	+12.89
8151	JULIANE SILVEIRA FREIRE DA SILVA	33	203	3	+12.39
9443	MARCELO HIDEMI UEMURA	8	376	4	+11.34
9356	LAÉRCIO DE SOUSA ARAÚJO	54	619	8	+6.04
9063	LINIANE GAZOLA	56	1,722	30	+3.29

Cross-section heterogeneity is real. But it is unconditional — each statistician sits in their own firm portfolio, and β is being attributed to the signer rather than to the firms they sign for. The FE ladder tests whether this attribution holds up.

FE ladder + interaction (n=1,295; 14 firms; 24 statisticians)

Spec	β_sponsored	SE	p	R²
A: no FE	6.31	1.29	<0.001	0.018
B: firm FE	4.90	1.24	<0.001	0.174
C: statistician FE	4.59	1.22	<0.001	0.187
D: firm + stat FE	4.99	1.24	<0.001	0.190
E: D + (stat × sp)	—	—	(F-test below)	0.202

Statistician FE alone (C) absorbs essentially the same R² as firm FE alone (B), because most statisticians sign mostly for one firm — the two FE sets are correlated.
Adding statistician FE on top of firm FE (D vs B) raises R² by only 0.016.
F-test of statistician × sponsored interaction over D: F(16, 1249) = 1.19, p = 0.27. Not significant. Within firm and statistician FE, there is no identifiable statistician-specific component to sponsor bias.

	Observed	Null (perm, n=500)
Mean within-firm sponsor-rate spread	0.140	0.157 (sd 0.026)
Permutation p	0.726
Share of firms with chi² p < 0.05	5.3 %	(≈ chance)

Observed spread is less than the null mean — actively the opposite of routing. Sponsored polls are not directed to specific statisticians within firms; the signer is whoever's available.

Interpretation

The firm is the slant-control unit. The statistician is a passive signatory whose unconditional β reflects which firms they happened to sign for, not what they did at any one firm.

The user's reading of the null (Henrik, 2026-06-16, while discussing this run): if statisticians knew about and disagreed with the bias, we would expect some to refuse to sign biased polls. The within-firm homogeneity is then evidence that bias is induced at a point the statistician does not see. That is consistent with two distinct mechanisms — neither of which this test can discriminate between:

Bias at a margin outside the statistician's purview. The plano amostral the statistician signs is the declared methodology. The actual fielding — which substrata get over-quotaed, which interviewers ask what, where the door-knocks land, how the post-stratification weights resolve borderline cases — happens at the firm's operational level, often without statistician supervision. This is especially likely for the rent-a-signature signers (LINIANE GAZOLA signing for 39 firms across 19 UFs cannot be supervising any of them). Channel A as executed may diverge from Channel A as declared without the statistician's awareness or consent.
Channel B fabrication after data collection. The statistician signs the registration with the methodology declaration; the published numbers are edited at the commercial / management layer after the data come in. This bypasses the statistician entirely.

Both predict (i) within-firm statistician homogeneity (the FE test), (ii) null within-firm sorting (the permutation test), and (iii) the absence of large declared-design differences between sponsored and unsponsored polls (the AN-024 / AN-033 / AN-041 / AN-042 / AN-043 rule-out series). The §sec:policy "size-mismatch problem" — that documented design levers do not add up to the +7 pp headline — finds a possible explanation here: the slant is induced at margins outside what gets disclosed.

A test that would discriminate

(i) and (ii) differ in what they predict about the audit trail (LE.34 §1). Under (i), the planilhas-individuais audit recovers the actual fielding pattern (if collected) and the slant should be visible against the declared plano amostral. Under (ii), the planilhas back the published numbers (because the published numbers were never the data) and the audit fails to detect anything.

The LE.34 §1 audit right is barely exercised in practice. If our sample of audit cases (Use 2 of the EJ agenda, future work) ever materializes, the conditional distribution of outcomes (audit finds operational deviation vs audit comes up empty) is the test.

Limits

Subframe is small: 14 firms, n=1,295. The interaction F-test has reasonable power against medium-sized statistician β differences but is underpowered against small ones.
Unconditional per-statistician β table mixes statistician with firm-portfolio. The headline ranking is descriptively useful (and supports the rent-a-signature framing in the CONRE thread) but should not be read as "this statistician personally produces biased polls".
The two interpretations (margin outside purview vs Channel B fabrication) are observationally equivalent here; the test is not yet identifying which.
The CD_CONRE normalization (digit-extraction regex) likely still collapses a few distinct statisticians who share leading digits across regions. Spot-checks suggest impact <5 %.

Files

intermediate: source/intermediate/statistician_map_2024.py → build/intermediate/statistician_map_2024.tsv
script: source/analysis/an-080-slant-unit-firm-vs-statistician.py
tables: build/table/an-080-slant-unit-firm-vs-statistician.csv, build/table/an-080-statistician-beta.csv, build/table/an-080-summary.json
thinking: docs/thinking/conre-statistician-lever.md (this analysis grounds the "Is the statistician a slant-control unit in the data?" section).