Firm is the slant-control unit, not the statistician. Within multi-statistician firms, adding statistician FE on top of firm FE raises R² by only 0.016 (0.174→0.190) and the interaction F(16, 1249)=1.19, p=0.27 is null. Within-firm sponsor-routing is also null (permutation p=0.73 on 19 multi-stat firms). Consistent with bias being either uniformly accepted by all signers of a firm, or induced at a margin the statistician does not supervise — the test cannot discriminate.
Question
The CONRE statistician thread in
docs/thinking/conre-statistician-lever.md proposed that
personally-on-record signatories (NM_ESTATISTICO_RESP +
CD_CONRE) could be a sleeping legal lever for policing bias. The
empirical version of that question: is the statistician a
separate slant-control unit from the firm? Two complementary
tests.
Q1. Within firms, do sponsored polls cluster on specific signers?
Permutation test on within-firm sponsor-rate spread across statisticians, on 19 multi-stat firms with ≥2 sponsored, ≥2 unsponsored polls. Per-firm chi-squared on (statistician × sponsored) crosstabs as a secondary descriptive.
Q2. Within firms, does β vary by statistician?
Four-spec FE ladder + F-test of statistician × sponsored interaction over firm + stat FE base. Subframe restricted to firms with ≥3 sp, ≥3 un, ≥2 stats (14 firms, 24 statisticians, n=1,295).
Findings
Per-statistician β (unconditional, 23 signers)
| Statistic | Value |
|---|---|
| Mean | 5.37 |
| Std | 5.06 |
| Range | −4.43 to +13.46 |
| Share β > 0 | 82.6 % |
| Share β > 5 | 60.9 % |
| Share β > 10 | 17.4 % |
Top by β (with n_sp ≥ 5 & n_un ≥ 5):
| CONRE | Name | n_sp | n_un | n_firms | β |
|---|---|---|---|---|---|
| 11248 | ANGELA MARIA DA SILVA | 5 | 74 | 1 | +13.46 |
| 9019 | UBIRAJARA ALVES TRINDADE SAMPAIO | 5 | 10 | 1 | +12.89 |
| 8151 | JULIANE SILVEIRA FREIRE DA SILVA | 33 | 203 | 3 | +12.39 |
| 9443 | MARCELO HIDEMI UEMURA | 8 | 376 | 4 | +11.34 |
| 9356 | LAÉRCIO DE SOUSA ARAÚJO | 54 | 619 | 8 | +6.04 |
| 9063 | LINIANE GAZOLA | 56 | 1,722 | 30 | +3.29 |
Cross-section heterogeneity is real. But it is unconditional — each statistician sits in their own firm portfolio, and β is being attributed to the signer rather than to the firms they sign for. The FE ladder tests whether this attribution holds up.
FE ladder + interaction (n=1,295; 14 firms; 24 statisticians)
| Spec | β_sponsored | SE | p | R² |
|---|---|---|---|---|
| A: no FE | 6.31 | 1.29 | <0.001 | 0.018 |
| B: firm FE | 4.90 | 1.24 | <0.001 | 0.174 |
| C: statistician FE | 4.59 | 1.22 | <0.001 | 0.187 |
| D: firm + stat FE | 4.99 | 1.24 | <0.001 | 0.190 |
| E: D + (stat × sp) | — | — | (F-test below) | 0.202 |
- Statistician FE alone (C) absorbs essentially the same R² as firm FE alone (B), because most statisticians sign mostly for one firm — the two FE sets are correlated.
- Adding statistician FE on top of firm FE (D vs B) raises R² by only 0.016.
- F-test of statistician × sponsored interaction over D: F(16, 1249) = 1.19, p = 0.27. Not significant. Within firm and statistician FE, there is no identifiable statistician-specific component to sponsor bias.
Within-firm sponsor routing (19 firms)
| Observed | Null (perm, n=500) | |
|---|---|---|
| Mean within-firm sponsor-rate spread | 0.140 | 0.157 (sd 0.026) |
| Permutation p | 0.726 | |
| Share of firms with chi² p < 0.05 | 5.3 % | (≈ chance) |
Observed spread is less than the null mean — actively the opposite of routing. Sponsored polls are not directed to specific statisticians within firms; the signer is whoever's available.
Interpretation
The firm is the slant-control unit. The statistician is a passive signatory whose unconditional β reflects which firms they happened to sign for, not what they did at any one firm.
The user's reading of the null (Henrik, 2026-06-16, while discussing this run): if statisticians knew about and disagreed with the bias, we would expect some to refuse to sign biased polls. The within-firm homogeneity is then evidence that bias is induced at a point the statistician does not see. That is consistent with two distinct mechanisms — neither of which this test can discriminate between:
- Bias at a margin outside the statistician's purview. The plano amostral the statistician signs is the declared methodology. The actual fielding — which substrata get over-quotaed, which interviewers ask what, where the door-knocks land, how the post-stratification weights resolve borderline cases — happens at the firm's operational level, often without statistician supervision. This is especially likely for the rent-a-signature signers (LINIANE GAZOLA signing for 39 firms across 19 UFs cannot be supervising any of them). Channel A as executed may diverge from Channel A as declared without the statistician's awareness or consent.
- Channel B fabrication after data collection. The statistician signs the registration with the methodology declaration; the published numbers are edited at the commercial / management layer after the data come in. This bypasses the statistician entirely.
Both predict (i) within-firm statistician homogeneity (the FE test), (ii) null within-firm sorting (the permutation test), and (iii) the absence of large declared-design differences between sponsored and unsponsored polls (the AN-024 / AN-033 / AN-041 / AN-042 / AN-043 rule-out series). The §sec:policy "size-mismatch problem" — that documented design levers do not add up to the +7 pp headline — finds a possible explanation here: the slant is induced at margins outside what gets disclosed.
A test that would discriminate
(i) and (ii) differ in what they predict about the audit trail (LE.34 §1). Under (i), the planilhas-individuais audit recovers the actual fielding pattern (if collected) and the slant should be visible against the declared plano amostral. Under (ii), the planilhas back the published numbers (because the published numbers were never the data) and the audit fails to detect anything.
The LE.34 §1 audit right is barely exercised in practice. If our sample of audit cases (Use 2 of the EJ agenda, future work) ever materializes, the conditional distribution of outcomes (audit finds operational deviation vs audit comes up empty) is the test.
Limits
- Subframe is small: 14 firms, n=1,295. The interaction F-test has reasonable power against medium-sized statistician β differences but is underpowered against small ones.
- Unconditional per-statistician β table mixes statistician with firm-portfolio. The headline ranking is descriptively useful (and supports the rent-a-signature framing in the CONRE thread) but should not be read as "this statistician personally produces biased polls".
- The two interpretations (margin outside purview vs Channel B fabrication) are observationally equivalent here; the test is not yet identifying which.
- The CD_CONRE normalization (digit-extraction regex) likely still collapses a few distinct statisticians who share leading digits across regions. Spot-checks suggest impact <5 %.
Files
- intermediate:
source/intermediate/statistician_map_2024.py→build/intermediate/statistician_map_2024.tsv - script:
source/analysis/an-080-slant-unit-firm-vs-statistician.py - tables:
build/table/an-080-slant-unit-firm-vs-statistician.csv,build/table/an-080-statistician-beta.csv,build/table/an-080-summary.json - thinking:
docs/thinking/conre-statistician-lever.md(this analysis grounds the "Is the statistician a slant-control unit in the data?" section).