AN-087: Trusted-firm robustness across four definitions

Trusted-firm advantage survives 3 of 4 alternative definitions for the two simplest outcomes (calls_winner_first +10 pp, mean |error| −0.9 to −1.7 pp) — robust to hand-picked, top-10-UF-spread, or low-|β| definitions. Volume-based (top-10 by

Hypothesis: H13: Shell-contratante polls show larger residual β
Confidence: green
Type: robustness

Script: source/analysis/an-087-trusted-firm-robustness.py
Target: build/table/an-087-trusted-firm-robustness.csv
Status: interpreted · 2026-06-17
Created: 2026-06-17

User concern (2026-06-17): the AN-085 "trusted firm" list was a hand-picked name-recognition heuristic. This script tests robustness under four alternative definitions, plus checks whether the bucket-dummy findings (other_firm, candidate) survive each.

Four definitions

Definition	Selection rule	n firms	n polls
D1 hand-picked	DATAFOLHA, QUAEST, REAL TIME MIDIA, PARANÁ PESQUISAS, VERITA (AN-085 / AN-086 cut)	5	558
D2 top-10 volume	10 firms with most polls in matched sample	10	1,442
D3 top-10 UF spread	10 firms in most distinct UFs	10	958
D4 bottom-10 \|β\|	10 firms in AN-016 with smallest \|within-firm β\|, n≥30	10	656

Overlap

D1 ⊂ D3 — all five hand-picked firms appear in the top-10 by geographic spread. D1 ∩ D2 = 3 (Paraná, Veritá, Real Time); only 3 hand-picked firms are also top-10 by volume because DATAFOLHA and QUAEST focus on large cities with fewer protocols. D1 ∩ D4 = 2 (Paraná, Veritá) — DATAFOLHA, QUAEST, REAL TIME don't have enough self-sponsored polls to identify a within-firm β at all, so they can't be ranked by |β|. D4 is dominated by smaller regional firms with mixed candidate/media books and low differential slant (AGILI, AR7, IIP, INSTITUTO GERAIS, NEXXUS MAIS, PROMIDIA, W J MENDES, ROBERTO LORENZZON).

Trusted-firm coefficient — robustness table

Universe spec, race FE, dependent variable = each accuracy measure. Coefficient on is_trusted controlling for is_candidate, is_pollster_self, is_other_firm.

Definition	n_trusted	calls_winner_1st	margin_error	mean \|err\|
D1 hand-picked	558	+0.100 (p<0.001)	−2.02 (p=0.001)	−1.15 (p<0.001)
D2 top-10 volume	1,442	+0.046 (p=0.059)	+1.09 (p=0.08)	−0.18 (p=0.49)
D3 top-10 UF spread	950	+0.103 (p=0.001)	−1.15 (p=0.08)	−0.87 (p=0.006)
D4 low \|β\|	656	+0.101 (p<0.001)	−0.98 (p=0.20)	−1.74 (p<0.001)

Reading:

The +10 pp advantage on poll_calls_winner_first is robust to 3 of 4 definitions. D1, D3, D4 all give +0.100 (p ≤ 0.001). D2 gives only +0.046 (p = 0.06). The mean |error| advantage is also robust to 3 of 4 — D2 gives null.

The margin_error advantage is fragile. Only D1 shows −2.02 (p = 0.001). D2 actually REVERSES the sign (+1.09, p = 0.08); D3 and D4 are borderline null (p = 0.08, p = 0.20). The AN-085 headline "trusted firms reduce margin error by 2 pp" is sensitive to the firm list — DATAFOLHA / QUAEST are doing the work, not Veritá / Paraná. Without DATAFOLHA + QUAEST in the set (D2, D4), the margin_error advantage shrinks or disappears.

D2 (top-10 by volume) is the OUTLIER and a bad trust marker. It returns the WORST trusted-firm coefficients on every outcome — calls_winner_first marginal, margin_error positive, mean |error| null. The reason: high-volume firms in the matched sample are state-level specialist firms (INSTITUTO DATATRENDS n=203, VOX BRASIL n=167, 100% CIDADES n=158, RANKING BRASIL n=128, MOREIRA & NOLETO n=119) doing lots of candidate work, not the national-tier firms. Volume conflates "produces many polls" with "produces good polls."

Bucket dummies under each definition

Critical robustness check: does the other_firm finding (AN-082, AN-084, AN-085) depend on the trusted-firm definition?

Race FE, dependent variable = margin_error:

Definition	is_candidate	is_pollster_self	is_other_firm
D1 hand-picked	−2.53 (p=0.007)	−0.91 (p=0.14)	−1.95 (p=0.002)
D2 top-10 volume	−2.61 (p=0.005)	−1.08 (p=0.09)	−1.83 (p=0.003)
D3 top-10 UF spread	−2.48 (p=0.008)	−0.75 (p=0.23)	−1.84 (p=0.003)
D4 low \|β\|	−2.42 (p=0.011)	−0.74 (p=0.24)	−1.81 (p=0.004)

All three bucket coefficients hold the same sign, magnitude, and significance level across all 4 trusted-firm definitions.

is_candidate: −2.42 to −2.61 (all p ≤ 0.011)
is_other_firm: −1.81 to −1.95 (all p ≤ 0.004)
is_pollster_self: −0.74 to −1.08 (none sig)

The shell-bucket finding is independent of how we define trust. This is the cleanest robustness story we have for the H13 prediction.

Implications for the headline

The headline AN-082 / AN-085 / AN-086 sponsor-class contrasts are robust to trust definition. The shell finding stands.
The trusted-firm rank-concordance advantage is robust to 3 of 4 definitions. The +10 pp on calls_winner_first reproduces under hand-picked, UF-spread, and low-|β| definitions. The paper can claim this confidently.
The trusted-firm margin-error advantage should be downgraded. It's −2.02 pp in D1 but vanishes or reverses under D2, D3, D4. The AN-085 statement that trusted-firm media polls "reduce margin error by 3.16 pp within race" is driven mostly by DATAFOLHA + QUAEST in the cleanest cells; it does not generalize.
β-based "trust" (D4) gives the strongest mean |error| advantage (−1.74, p<0.001). This is conceptually clean: firms that don't differentially slant for sponsors also produce lower-error polls overall. The two trust dimensions (rank-concordance and magnitude accuracy) align with low-β firms — but rank-concordance is the more robust headline.

The 9-cell AN-086 table used D1 throughout. The qualitative ranking holds under D3 and D4 (trusted-firm rows at the top, non-trusted firm × small-media-sponsor near the bottom). The specific magnitudes shift — under D4, trusted-firm rows lose their margin-error edge but keep their mean |error| edge. The calls_winner_first ordering is essentially invariant.

For the paper, I'd recommend reporting:

AN-086 table using D1 (clearest narrative)
AN-087 robustness table as an appendix or footnote showing the +10 pp on calls_winner_first holds under D3/D4
A caveat that margin_error effects are firm-list-sensitive

Caveats

D4 sample is small. AN-016 only identifies β for 22 firms total; the bottom-10 |β| set is dominated by smaller regional firms with mixed books. Larger β-identifiable universe would require a more lenient cutoff (e.g. lowering the n≥5 self- sponsored requirement).
D2 / D3 cutoffs at top-10 are arbitrary. Top-15 or top-20 would change set composition. AN-085's caveat about the diluting effect of adding more firms applies.
The "true trusted" set is unobservable. Each definition proxies for a different aspect: D1 ≈ industry name recognition, D2 ≈ market share, D3 ≈ national operations, D4 ≈ differential-slant track record. They give different but partially overlapping sets and partially overlapping results. The aggregate signal is "trust matters; the exact list is fuzzy."
β-based circularity. D4 firms have low |β| by construction. Using them to predict accuracy is conceptually separate (β measures sponsor differential, accuracy measures level), but a careful reader could argue the two share unobserved firm-quality variance. The β-based definition is cleanest as a robustness layer to D1, not as the primary cut.

Follow-ups

Pool D1 ∪ D3 ∪ D4 as a "any-definition trusted" set and re-run the AN-086 table. Probably the strongest defensible cut for the paper.
External-source list. ABEP (Associação Brasileira de Empresas de Pesquisa) maintains a membership directory. Cross-reference would give an independent industry-association trust signal — the cleanest external anchor possible.
2020-cycle β. Compute β on 2020 mayoral data, label trusted as bottom-quartile by 2020 |β|, then use that label on 2024 polls. Removes the post-hoc circularity in D4.

Artifacts

Script: source/analysis/an-087-trusted-firm-robustness.py
Spec-level coefficients: build/table/an-087-trusted-firm-robustness.csv
Firm sets per definition: build/table/an-087-trusted-firm-robustness__sets.csv
Headline JSON: build/table/an-087-trusted-firm-robustness.json