id: an-093 hypothesis: shell-contratante headline: Final paper appendix table. Five buckets (major_media reference, small_media, candidate, pollster_self, other_firm) × four FE specs × three accuracy outcomes, all with log_sample_size control. Under race FE (S1): candidate is 2.69 pp more accurate on margin than major-media (p<0.05); other_firm 2.00 pp more accurate (p<0.10). Under firm + race FE (S3): all four non-major-media buckets show 1.8 to 3.8 pp lower margin error than major-media within firm × race — but the major-media reference is thin (n=304, 4.3% of sample), so S3 selection is sharp and the magnitudes should be read with caution. S1 is the more interpretable spec. type: synthesis-table-paper-ready question: "Single consolidated appendix table for the paper, with major-media as the cleanest possible reference and sample size controlled." tags: ["hyp:shell-contratante", "paper-ready", "spec-ladder", "synthesis", "headline"] status: interpreted status_date: 2026-06-17 confidence: green created: 2026-06-17 script: source/analysis/an-093-paper-spec-ladder-final.py target: build/table/an-093-paper-spec-ladder-final.csv

AN-093: Final paper appendix table

Supersedes AN-090 for paper appendix purposes. Five buckets with major-media as the cleanest possible reference, log_sample_size control in every spec, four FE levels.

Bucket counts (n = 6,991)

Bucket	n	%	Description
major_media (reference)	304	4.3 %	GLOBO / FOLHA / ESTADÃO / large regional outlets
small_media	2,644	37.8 %	Small digital outlets / blogs
candidate	1,026	14.7 %	CPF / committee / party / party-name
pollster_self	1,464	20.9 %	Firm self-contracts
other_firm	1,553	22.2 %	Third-party / shell-suspect

Table

Cluster SE at race in parens. * p<0.10, ** p<0.05, *** p<0.01.

Margin error (the headline outcome)

Bucket (ref = major_media)	S0 No FE	S1 Race FE	S2 Firm FE	S3 Firm + Race FE
small_media	−1.46 (2.48)	−0.14 (0.99)	−4.23 (3.33)	−2.23* (1.20)
candidate	−2.94 (2.53)	−2.69** (1.25)	−4.07 (3.35)	−3.81** (1.50)
pollster_self	−1.85 (2.43)	−0.99 (1.03)	−2.45 (3.29)	−1.82 (1.32)
other_firm	−5.08** (2.57)	−2.00* (1.09)	−6.97** (3.38)	−2.94** (1.27)
log_sample	−6.45*** (1.45)	−1.31 (0.90)	−7.70*** (1.35)	−0.37 (1.07)
n	6,693	5,521	6,633	5,424

Calls winner first

Bucket (ref = major_media)	S0 No FE	S1 Race FE	S2 Firm FE	S3 Firm + Race FE
small_media	−0.012 (0.038)	−0.056 (0.040)	+0.033 (0.054)	−0.002 (0.065)
candidate	−0.032 (0.039)	−0.021 (0.048)	+0.069 (0.058)	+0.056 (0.074)
pollster_self	−0.036 (0.039)	−0.026 (0.042)	+0.073 (0.058)	+0.045 (0.070)
other_firm	−0.056 (0.038)	−0.071* (0.041)	+0.020 (0.053)	+0.008 (0.063)
log_sample	−0.061** (0.030)	−0.014 (0.036)	−0.119*** (0.034)	−0.025 (0.041)
n	6,991	5,750	6,929	5,655

Mean |error|

Bucket (ref = major_media)	S0 No FE	S1 Race FE	S2 Firm FE	S3 Firm + Race FE
small_media	+1.35*** (0.50)	+0.90** (0.39)	−0.25 (0.66)	−1.03 (0.65)
candidate	+2.10*** (0.58)	+1.05* (0.54)	−0.18 (0.70)	−0.84 (0.66)
pollster_self	+1.42*** (0.51)	+1.01** (0.46)	+0.05 (0.69)	−0.94 (0.66)
other_firm	+1.15** (0.51)	+0.91** (0.44)	−0.49 (0.67)	−0.82 (0.67)
log_sample	−0.76** (0.34)	−0.56 (0.45)	−0.28 (0.40)	−0.62 (0.54)
n	6,991	5,750	6,929	5,655

Reading

Race FE (S1) — the headline spec

This is the cleanest interpretable comparison. Relative to major-media polls in the same race:

candidate: 2.69 pp lower margin error (p<0.05); 1.05 pp higher mean |error| (p<0.10). The cross-race-controlled story is "candidate-sponsored polls miss the level by ~1 pp more but predict the spread better." Reads as the classic slant pattern: boosting one cand symmetrically with understating another keeps the margin roughly right but inflates magnitude error.
other_firm: 2.00 pp lower margin error (p<0.10), 0.91 pp higher mean |error| (p<0.05), 7.1 pp less likely to call the winner (p<0.10). Same direction as candidate — the shell-bucket pattern.
small_media: 0.90 pp higher mean |error| (p<0.05), null on margin and rank. The small-media bucket looks LIKE major-media on margin but is noisier on level.
pollster_self: 1.01 pp higher mean |error| (p<0.05), null otherwise. Mixed signal — some pollster_self polls are showcase Datafolha (very accurate); the average is dragged up by the rest.

Firm + Race FE (S3) — the most demanding spec, but interpret carefully

Under S3, every non-major-media bucket has 1.8 to 3.8 pp lower margin error than major-media within firm × race (small_media −2.23*, candidate −3.81**, pollster_self −1.82, other_firm −2.94**). On mean |error|, all four buckets are now directionally NEGATIVE (more accurate than major-media within firm × race).

This counterintuitive direction needs care. Two readings:

Selection on the thin reference. Major-media is only 304 polls (4.3 % of sample). Under firm + race FE, most race × firm cells have no major-media-sponsored poll and get dropped; the surviving cells are highly selected. The major-media polls that survive in those cells may systematically go to tighter races (Globo / Folha cover the marquee races), inflating their margin error mechanically. If true, this is selection rather than a real "major-media polls are less accurate" effect.
Real signal. Major-media polls might genuinely make more decisive calls (publish a confident point estimate that the audience expects to be precise) and miss the margin more often. Smaller-stakes polls might be more cautious by default.

The two readings can't be separated in this data. S1 (race FE only) is the more reliably interpretable spec for the paper appendix.

log_sample_size

Negative on all three outcomes under S0 — bigger samples are more accurate cross-sectionally. Race FE absorbs almost all of it (S1, S3 coefficients shrink and lose significance). The sample-size correlation is mostly race-difficulty composition (big samples cover important races where the polls are inherently harder to call). The S2 (firm FE alone) coefficient is still significant — within firm, larger samples are more accurate.

What the paper should report

Lead with the 4-bucket S1 result from AN-090 in the §Results headline. That's the most familiar framing with the broad "media" reference.
In the §Identification or §Robustness section, present AN-093 (this table) as the appendix. Show that the findings survive (a) major-media as reference, (b) sample-size control. Highlight the small_media coefficient as evidence the media bucket is heterogeneous.
Caveat AN-093 S3 carefully. The thin major-media reference may inflate the S3 magnitudes via selection. The clean interpretation is S1; S3 is a robustness layer.

Tradeoffs vs AN-090

	AN-090 (4 buckets)	AN-093 (5 buckets)
Reference	media (all 2,948)	major_media (304)
Reference interpretation	mostly small-blog-sponsored	high-rep national outlets
Sample-size control	no	yes
Cleanest single spec	S1 race FE	S1 race FE
Reference cleanliness	weaker	stronger
Reference power	strong (n=2,948)	weaker (n=304)
S3 reliability	reasonable	suspicious (thin ref)
Paper role	headline §Results	appendix §Robustness

Both have value. Recommend keeping both: AN-090 for §Results, AN-093 for §Appendix. Each speaks to a different audience concern.

Caveats

n=304 major-media reference is the load-bearing limitation. Reasonable for cross-sectional comparisons (S1) but selection-sensitive for within-firm-within-race S3.
Major-media regex is judgment-based. Adding tier-2 regional outlets to "major" would shift the count up but contaminate the reference. The current list (national flagships + ~20 largest regional papers) is a defensible middle.
log_sample_size = log(declared sample size). Per AN-080, the operational fielded sample may diverge from the declared one. Using declared size is conservative.
AN-091 within-firm major-vs-small-media test was null. AN-093 S3 small_media coefficient (−2.23, p<0.10) appears to contradict that, but AN-091's MIXED-firms sample was 29 firms × 677 polls; AN-093 S3 uses the full 5,424-poll sample with sharp selection on major-media-cells. The two tests are not directly comparable.

Artifacts

Script: source/analysis/an-093-paper-spec-ladder-final.py
CSV: build/table/an-093-paper-spec-ladder-final.csv
LaTeX (paper-ready): build/table/an-093-paper-spec-ladder-final.tex
Markdown: build/table/an-093-paper-spec-ladder-final.md
JSON: build/table/an-093-paper-spec-ladder-final.json

AN-090 4-bucket spec ladder — headline version
AN-091 small-media vs major-media within firm
AN-092 sample-size robustness
AN-085 trusted-source decomposition

AN-093: Final paper appendix table

AN-093: Final paper appendix table

Bucket counts (n = 6,991)

Table

Margin error (the headline outcome)

Calls winner first

Mean |error|

Reading

Race FE (S1) — the headline spec

Firm + Race FE (S3) — the most demanding spec, but interpret carefully

log_sample_size

What the paper should report

Tradeoffs vs AN-090

Caveats

Artifacts

Related