id: an-092 hypothesis: shell-contratante headline: Adding log_sample_size as a control barely moves the sponsor-bucket coefficients under any FE-controlled spec (S1, S2, S3). The largest shift is under S0 (no FE) where log_sample_size soaks up race-level difficulty composition. Race FE absorbs ~95% of the sample-size variation. Net: the AN-090 spec ladder is robust to sample-size control. log_sample_size coefficient is −6.4 pp/log-unit on margin_error in S0 (p<0.001) but only −1.3 (ns) in S1 (race FE), confirming that sample size is mostly proxying for race difficulty (big races attract big polls). type: robustness question: "Do the AN-090 sponsor-bucket coefficients survive controlling for sample size?" tags: ["hyp:shell-contratante", "robustness", "sample-size", "spec-ladder"] status: interpreted status_date: 2026-06-17 confidence: green created: 2026-06-17 script: source/analysis/an-092-spec-ladder-with-sample-size.py target: build/table/an-092-spec-ladder-with-sample-size.csv

AN-092: Sample-size robustness of the AN-090 spec ladder

User flagged (2026-06-17) that AN-082→091 don't control for the declared sample size (QT_ENTREVISTADO). Polling theory says SE ∝ 1/√n, so larger samples should be more accurate mechanically. Candidate-linked polls have systematically smaller samples (median 360-400 vs 408+ for media), so part of any "candidate-sponsored polls are less accurate" finding could be mechanical.

Sample sizes by bucket

Bucket	n	median sample	mean sample
pollster_self	1,464	451	540
media	2,948	408	540
party	61	400	474
other_firm	1,553	400	470
party_name	91	326	453
committee_prefeito	339	360	394
individual (CPF)	508	393	401

Candidate-linked buckets (CPF / committee / party-name) all sit below the media / pollster_self median. The confound is real.

Spec ladder comparison

For each (outcome × bucket × spec), shows the coefficient WITHOUT vs WITH log_sample_size as a continuous control. SE in parens, cluster-robust at race. * p<0.10, ** p<0.05, *** p<0.01.

Margin error (the most-robust outcome from AN-090)

Bucket	Spec	β WITHOUT log_sample	β WITH log_sample	Δ
is_candidate	S0 No FE	+0.018 (1.075)	−1.620* (0.976)	−1.64
is_candidate	S1 Race FE	−2.572*** (0.932)	−2.569*** (0.932)	+0.003
is_candidate	S2 Firm FE	+1.008 (1.069)	−0.093 (1.048)	−1.10
is_candidate	S3 Firm+Race FE	−1.755* (0.974)	−1.754* (0.973)	+0.000
is_pollster_self	S0 No FE	−0.306 (0.898)	−0.542 (0.866)	−0.24
is_pollster_self	S1 Race FE	−0.841 (0.612)	−0.871 (0.613)	−0.03
is_pollster_self	S2 Firm FE	+1.372 (1.074)	+1.545 (1.035)	+0.17
is_pollster_self	S3 Firm+Race FE	+0.223 (0.776)	+0.232 (0.774)	+0.01
is_other_firm	S0 No FE	−3.009*** (0.841)	−3.768*** (0.788)	−0.76
is_other_firm	S1 Race FE	−1.867*** (0.618)	−1.884*** (0.620)	−0.02
is_other_firm	S2 Firm FE	−2.645*** (0.899)	−3.058*** (0.886)	−0.41
is_other_firm	S3 Firm+Race FE	−0.905 (0.617)	−0.909 (0.617)	−0.00

Reading: under race FE (S1) or firm + race FE (S3), the sample-size control changes the sponsor coefficients by less than 0.04 pp. The big shifts (Δ from −0.4 to −1.6) are all under specs that DON'T include race FE (S0, S2). Race FE absorbs 95+% of the sample-size variation because tight races attract big polls (sample size is endogenous to race difficulty).

Mean |error|

Bucket	Spec	β WITHOUT log_sample	β WITH log_sample	Δ
is_candidate	S0 No FE	+1.072*** (0.382)	+0.883** (0.389)	−0.19
is_candidate	S1 Race FE	+0.270 (0.460)	+0.275 (0.459)	+0.01
is_candidate	S3 Firm+Race FE	+0.110 (0.444)	+0.114 (0.445)	+0.01
is_other_firm	S0 No FE	+0.029 (0.276)	−0.062 (0.277)	−0.09
is_other_firm	S3 Firm+Race FE	+0.127 (0.288)	+0.119 (0.288)	−0.01

Same story: small shifts under FE specs, slightly larger under raw means. The candidate raw mean-error advantage of +1.07 pp shrinks to +0.88 pp once sample size is controlled — about 18% of the raw effect was the candidate-poll-sample-size-is-smaller mechanical confound. Under race FE the effect is null either way.

Calls winner first

Bucket	Spec	β WITHOUT log_sample	β WITH log_sample	Δ
is_candidate	S0 No FE	−0.007 (0.021)	−0.021 (0.019)	−0.014
is_candidate	S1 Race FE	+0.027 (0.032)	+0.027 (0.032)	+0.000
is_candidate	S3 Firm+Race FE	+0.057 (0.035)	+0.057* (0.035)	+0.000
is_other_firm	S0 No FE	−0.039** (0.018)	−0.045** (0.018)	−0.007
is_other_firm	S3 Firm+Race FE	+0.010 (0.025)	+0.009 (0.024)	+0.000

Trivial shifts at every spec. Sample size barely affects calls_winner_first beyond what race FE already absorbs.

The log_sample_size coefficient itself

Outcome	S0 No FE	S1 Race FE	S2 Firm FE	S3 Firm+Race FE
Calls winner first	−0.060** (0.030)	−0.016 (0.037)	−0.120*** (0.034)	−0.025 (0.041)
Margin error	−6.410*** (1.478)	−1.311 (0.899)	−7.594*** (1.366)	−0.324 (1.068)
Mean \|error\|	−0.797** (0.348)	−0.531 (0.449)	−0.272 (0.400)	−0.599 (0.543)

Sanity check on sign: log_sample_size is mostly negative on all three outcomes — bigger samples are more accurate (lower margin error, lower mean |error|).

The surprise: log_sample_size is NEGATIVE on poll_calls_winner_first under S0 and S2. Reading: larger polls cluster in TIGHTER races where calling the winner is harder, so the raw cross-sectional and within-firm correlations are negative. Under S1 (race FE) and S3 (firm + race FE) this selection is absorbed and the coefficient becomes near-zero.

What this means for AN-090 and the paper

The AN-090 spec ladder is robust to sample-size control. The headline findings — particularly the is_other_firm margin-error coefficient in S1 / S2 / S3 — change by less than 0.5 pp under any specification when log_sample is added.
Sample size is mostly a race-difficulty proxy in this data. It correlates strongly with margin_error / mean |error| / calls_winner_first cross-sectionally but is absorbed by race FE. The paper doesn't need a separate sample-size control alongside race FE; race FE does the work.
The candidate raw mean-error effect (+1.07 pp under S0) is partly mechanical sample size — about 18% of it. The honest statement is "candidate-sponsored polls are 0.88 pp less accurate than media polls at average sample size." The race-FE version is null either way.
The other_firm margin-error effect strengthens slightly under sample-size control (−3.01 → −3.77 in S0, −2.65 → −3.06 in S2). With sample size held constant, other_firm shells look even more anomalous. This is consistent with AN-085's finding that the shell signal is about who pays, not about poll quality.

Recommendation

Add log_sample_size to the headline regression specs as a matter of conservative practice. It barely moves the coefficients under race FE but it sharpens raw-mean interpretations and gives a defensible answer to "did you control for sample size?" The paper appendix table (AN-090) should be updated to include the WITH-log_sample version as the primary, with a footnote that it changes nothing under race FE.

Caveats

sample_size is the DECLARED sample size from the TSE registry, not the actual fielded sample. Operational deviations (the AN-080 finding) mean a firm that registers n=400 might field fewer / different respondents. We use the declared value because it's all we have.
The negative log_sample on calls_winner_first is a selection artifact, not a real precision effect. Don't cite the S0/S2 coefficient as evidence that larger polls are less accurate — they're not, they just go to tighter races.
Adding log_sample to S3 (firm + race FE) absorbs even more sample variation but gains little: race FE within race already captures most of the relevant variation. log_sample in S3 is near-zero (−0.03 on calls_winner_first; −0.32 on margin_error; −0.60 on mean |error|) and never significant.

Artifacts

Script: source/analysis/an-092-spec-ladder-with-sample-size.py
CSV (full coefficient matrix): build/table/an-092-spec-ladder-with-sample-size.csv
Markdown: build/table/an-092-spec-ladder-with-sample-size.md
Headline JSON: build/table/an-092-spec-ladder-with-sample-size.json

AN-092: Sample-size robustness of the AN-090 spec ladder

AN-092: Sample-size robustness of the AN-090 spec ladder

Sample sizes by bucket

Spec ladder comparison

Margin error (the most-robust outcome from AN-090)

Mean |error|

Calls winner first

The log_sample_size coefficient itself

What this means for AN-090 and the paper

Recommendation

Caveats

Artifacts

Related