id: an-093 hypothesis: shell-contratante headline: Final paper appendix table. Five buckets (major_media reference, small_media, candidate, pollster_self, other_firm) × four FE specs × three accuracy outcomes, all with log_sample_size control. Under race FE (S1): candidate is 2.69 pp more accurate on margin than major-media (p<0.05); other_firm 2.00 pp more accurate (p<0.10). Under firm + race FE (S3): all four non-major-media buckets show 1.8 to 3.8 pp lower margin error than major-media within firm × race — but the major-media reference is thin (n=304, 4.3% of sample), so S3 selection is sharp and the magnitudes should be read with caution. S1 is the more interpretable spec. type: synthesis-table-paper-ready question: "Single consolidated appendix table for the paper, with major-media as the cleanest possible reference and sample size controlled." tags: ["hyp:shell-contratante", "paper-ready", "spec-ladder", "synthesis", "headline"] status: interpreted status_date: 2026-06-17 confidence: green created: 2026-06-17 script: source/analysis/an-093-paper-spec-ladder-final.py target: build/table/an-093-paper-spec-ladder-final.csv
AN-093: Final paper appendix table
Supersedes AN-090 for paper appendix purposes. Five buckets with major-media as the cleanest possible reference, log_sample_size control in every spec, four FE levels.
Bucket counts (n = 6,991)
| Bucket | n | % | Description |
|---|---|---|---|
| major_media (reference) | 304 | 4.3 % | GLOBO / FOLHA / ESTADÃO / large regional outlets |
| small_media | 2,644 | 37.8 % | Small digital outlets / blogs |
| candidate | 1,026 | 14.7 % | CPF / committee / party / party-name |
| pollster_self | 1,464 | 20.9 % | Firm self-contracts |
| other_firm | 1,553 | 22.2 % | Third-party / shell-suspect |
Table
Cluster SE at race in parens. * p<0.10, ** p<0.05, *** p<0.01.
Margin error (the headline outcome)
| Bucket (ref = major_media) | S0 No FE | S1 Race FE | S2 Firm FE | S3 Firm + Race FE |
|---|---|---|---|---|
| small_media | −1.46 (2.48) | −0.14 (0.99) | −4.23 (3.33) | −2.23* (1.20) |
| candidate | −2.94 (2.53) | −2.69** (1.25) | −4.07 (3.35) | −3.81** (1.50) |
| pollster_self | −1.85 (2.43) | −0.99 (1.03) | −2.45 (3.29) | −1.82 (1.32) |
| other_firm | −5.08** (2.57) | −2.00* (1.09) | −6.97** (3.38) | −2.94** (1.27) |
| log_sample | −6.45*** (1.45) | −1.31 (0.90) | −7.70*** (1.35) | −0.37 (1.07) |
| n | 6,693 | 5,521 | 6,633 | 5,424 |
Calls winner first
| Bucket (ref = major_media) | S0 No FE | S1 Race FE | S2 Firm FE | S3 Firm + Race FE |
|---|---|---|---|---|
| small_media | −0.012 (0.038) | −0.056 (0.040) | +0.033 (0.054) | −0.002 (0.065) |
| candidate | −0.032 (0.039) | −0.021 (0.048) | +0.069 (0.058) | +0.056 (0.074) |
| pollster_self | −0.036 (0.039) | −0.026 (0.042) | +0.073 (0.058) | +0.045 (0.070) |
| other_firm | −0.056 (0.038) | −0.071* (0.041) | +0.020 (0.053) | +0.008 (0.063) |
| log_sample | −0.061** (0.030) | −0.014 (0.036) | −0.119*** (0.034) | −0.025 (0.041) |
| n | 6,991 | 5,750 | 6,929 | 5,655 |
Mean |error|
| Bucket (ref = major_media) | S0 No FE | S1 Race FE | S2 Firm FE | S3 Firm + Race FE |
|---|---|---|---|---|
| small_media | +1.35*** (0.50) | +0.90** (0.39) | −0.25 (0.66) | −1.03 (0.65) |
| candidate | +2.10*** (0.58) | +1.05* (0.54) | −0.18 (0.70) | −0.84 (0.66) |
| pollster_self | +1.42*** (0.51) | +1.01** (0.46) | +0.05 (0.69) | −0.94 (0.66) |
| other_firm | +1.15** (0.51) | +0.91** (0.44) | −0.49 (0.67) | −0.82 (0.67) |
| log_sample | −0.76** (0.34) | −0.56 (0.45) | −0.28 (0.40) | −0.62 (0.54) |
| n | 6,991 | 5,750 | 6,929 | 5,655 |
Reading
Race FE (S1) — the headline spec
This is the cleanest interpretable comparison. Relative to major-media polls in the same race:
- candidate: 2.69 pp lower margin error (p<0.05); 1.05 pp higher mean |error| (p<0.10). The cross-race-controlled story is "candidate-sponsored polls miss the level by ~1 pp more but predict the spread better." Reads as the classic slant pattern: boosting one cand symmetrically with understating another keeps the margin roughly right but inflates magnitude error.
- other_firm: 2.00 pp lower margin error (p<0.10), 0.91 pp higher mean |error| (p<0.05), 7.1 pp less likely to call the winner (p<0.10). Same direction as candidate — the shell-bucket pattern.
- small_media: 0.90 pp higher mean |error| (p<0.05), null on margin and rank. The small-media bucket looks LIKE major-media on margin but is noisier on level.
- pollster_self: 1.01 pp higher mean |error| (p<0.05), null otherwise. Mixed signal — some pollster_self polls are showcase Datafolha (very accurate); the average is dragged up by the rest.
Firm + Race FE (S3) — the most demanding spec, but interpret carefully
Under S3, every non-major-media bucket has 1.8 to 3.8 pp lower margin error than major-media within firm × race (small_media −2.23*, candidate −3.81**, pollster_self −1.82, other_firm −2.94**). On mean |error|, all four buckets are now directionally NEGATIVE (more accurate than major-media within firm × race).
This counterintuitive direction needs care. Two readings:
Selection on the thin reference. Major-media is only 304 polls (4.3 % of sample). Under firm + race FE, most race × firm cells have no major-media-sponsored poll and get dropped; the surviving cells are highly selected. The major-media polls that survive in those cells may systematically go to tighter races (Globo / Folha cover the marquee races), inflating their margin error mechanically. If true, this is selection rather than a real "major-media polls are less accurate" effect.
Real signal. Major-media polls might genuinely make more decisive calls (publish a confident point estimate that the audience expects to be precise) and miss the margin more often. Smaller-stakes polls might be more cautious by default.
The two readings can't be separated in this data. S1 (race FE only) is the more reliably interpretable spec for the paper appendix.
log_sample_size
Negative on all three outcomes under S0 — bigger samples are more accurate cross-sectionally. Race FE absorbs almost all of it (S1, S3 coefficients shrink and lose significance). The sample-size correlation is mostly race-difficulty composition (big samples cover important races where the polls are inherently harder to call). The S2 (firm FE alone) coefficient is still significant — within firm, larger samples are more accurate.
What the paper should report
- Lead with the 4-bucket S1 result from AN-090 in the §Results headline. That's the most familiar framing with the broad "media" reference.
- In the §Identification or §Robustness section, present AN-093 (this table) as the appendix. Show that the findings survive (a) major-media as reference, (b) sample-size control. Highlight the small_media coefficient as evidence the media bucket is heterogeneous.
- Caveat AN-093 S3 carefully. The thin major-media reference may inflate the S3 magnitudes via selection. The clean interpretation is S1; S3 is a robustness layer.
Tradeoffs vs AN-090
| AN-090 (4 buckets) | AN-093 (5 buckets) | |
|---|---|---|
| Reference | media (all 2,948) | major_media (304) |
| Reference interpretation | mostly small-blog-sponsored | high-rep national outlets |
| Sample-size control | no | yes |
| Cleanest single spec | S1 race FE | S1 race FE |
| Reference cleanliness | weaker | stronger |
| Reference power | strong (n=2,948) | weaker (n=304) |
| S3 reliability | reasonable | suspicious (thin ref) |
| Paper role | headline §Results | appendix §Robustness |
Both have value. Recommend keeping both: AN-090 for §Results, AN-093 for §Appendix. Each speaks to a different audience concern.
Caveats
- n=304 major-media reference is the load-bearing limitation. Reasonable for cross-sectional comparisons (S1) but selection-sensitive for within-firm-within-race S3.
- Major-media regex is judgment-based. Adding tier-2 regional outlets to "major" would shift the count up but contaminate the reference. The current list (national flagships + ~20 largest regional papers) is a defensible middle.
- log_sample_size = log(declared sample size). Per AN-080, the operational fielded sample may diverge from the declared one. Using declared size is conservative.
- AN-091 within-firm major-vs-small-media test was null. AN-093 S3 small_media coefficient (−2.23, p<0.10) appears to contradict that, but AN-091's MIXED-firms sample was 29 firms × 677 polls; AN-093 S3 uses the full 5,424-poll sample with sharp selection on major-media-cells. The two tests are not directly comparable.
Artifacts
- Script:
source/analysis/an-093-paper-spec-ladder-final.py - CSV:
build/table/an-093-paper-spec-ladder-final.csv - LaTeX (paper-ready):
build/table/an-093-paper-spec-ladder-final.tex - Markdown:
build/table/an-093-paper-spec-ladder-final.md - JSON:
build/table/an-093-paper-spec-ladder-final.json