id: an-092 hypothesis: shell-contratante headline: Adding log_sample_size as a control barely moves the sponsor-bucket coefficients under any FE-controlled spec (S1, S2, S3). The largest shift is under S0 (no FE) where log_sample_size soaks up race-level difficulty composition. Race FE absorbs ~95% of the sample-size variation. Net: the AN-090 spec ladder is robust to sample-size control. log_sample_size coefficient is −6.4 pp/log-unit on margin_error in S0 (p<0.001) but only −1.3 (ns) in S1 (race FE), confirming that sample size is mostly proxying for race difficulty (big races attract big polls). type: robustness question: "Do the AN-090 sponsor-bucket coefficients survive controlling for sample size?" tags: ["hyp:shell-contratante", "robustness", "sample-size", "spec-ladder"] status: interpreted status_date: 2026-06-17 confidence: green created: 2026-06-17 script: source/analysis/an-092-spec-ladder-with-sample-size.py target: build/table/an-092-spec-ladder-with-sample-size.csv

AN-092: Sample-size robustness of the AN-090 spec ladder

User flagged (2026-06-17) that AN-082→091 don't control for the declared sample size (QT_ENTREVISTADO). Polling theory says SE ∝ 1/√n, so larger samples should be more accurate mechanically. Candidate-linked polls have systematically smaller samples (median 360-400 vs 408+ for media), so part of any "candidate-sponsored polls are less accurate" finding could be mechanical.

Sample sizes by bucket

Bucket n median sample mean sample
pollster_self 1,464 451 540
media 2,948 408 540
party 61 400 474
other_firm 1,553 400 470
party_name 91 326 453
committee_prefeito 339 360 394
individual (CPF) 508 393 401

Candidate-linked buckets (CPF / committee / party-name) all sit below the media / pollster_self median. The confound is real.

Spec ladder comparison

For each (outcome × bucket × spec), shows the coefficient WITHOUT vs WITH log_sample_size as a continuous control. SE in parens, cluster-robust at race. * p<0.10, ** p<0.05, *** p<0.01.

Margin error (the most-robust outcome from AN-090)

Bucket Spec β WITHOUT log_sample β WITH log_sample Δ
is_candidate S0 No FE +0.018 (1.075) −1.620* (0.976) −1.64
is_candidate S1 Race FE −2.572*** (0.932) −2.569*** (0.932) +0.003
is_candidate S2 Firm FE +1.008 (1.069) −0.093 (1.048) −1.10
is_candidate S3 Firm+Race FE −1.755* (0.974) −1.754* (0.973) +0.000
is_pollster_self S0 No FE −0.306 (0.898) −0.542 (0.866) −0.24
is_pollster_self S1 Race FE −0.841 (0.612) −0.871 (0.613) −0.03
is_pollster_self S2 Firm FE +1.372 (1.074) +1.545 (1.035) +0.17
is_pollster_self S3 Firm+Race FE +0.223 (0.776) +0.232 (0.774) +0.01
is_other_firm S0 No FE −3.009*** (0.841) −3.768*** (0.788) −0.76
is_other_firm S1 Race FE −1.867*** (0.618) −1.884*** (0.620) −0.02
is_other_firm S2 Firm FE −2.645*** (0.899) −3.058*** (0.886) −0.41
is_other_firm S3 Firm+Race FE −0.905 (0.617) −0.909 (0.617) −0.00

Reading: under race FE (S1) or firm + race FE (S3), the sample-size control changes the sponsor coefficients by less than 0.04 pp. The big shifts (Δ from −0.4 to −1.6) are all under specs that DON'T include race FE (S0, S2). Race FE absorbs 95+% of the sample-size variation because tight races attract big polls (sample size is endogenous to race difficulty).

Mean |error|

Bucket Spec β WITHOUT log_sample β WITH log_sample Δ
is_candidate S0 No FE +1.072*** (0.382) +0.883** (0.389) −0.19
is_candidate S1 Race FE +0.270 (0.460) +0.275 (0.459) +0.01
is_candidate S3 Firm+Race FE +0.110 (0.444) +0.114 (0.445) +0.01
is_other_firm S0 No FE +0.029 (0.276) −0.062 (0.277) −0.09
is_other_firm S3 Firm+Race FE +0.127 (0.288) +0.119 (0.288) −0.01

Same story: small shifts under FE specs, slightly larger under raw means. The candidate raw mean-error advantage of +1.07 pp shrinks to +0.88 pp once sample size is controlled — about 18% of the raw effect was the candidate-poll-sample-size-is-smaller mechanical confound. Under race FE the effect is null either way.

Calls winner first

Bucket Spec β WITHOUT log_sample β WITH log_sample Δ
is_candidate S0 No FE −0.007 (0.021) −0.021 (0.019) −0.014
is_candidate S1 Race FE +0.027 (0.032) +0.027 (0.032) +0.000
is_candidate S3 Firm+Race FE +0.057 (0.035) +0.057* (0.035) +0.000
is_other_firm S0 No FE −0.039** (0.018) −0.045** (0.018) −0.007
is_other_firm S3 Firm+Race FE +0.010 (0.025) +0.009 (0.024) +0.000

Trivial shifts at every spec. Sample size barely affects calls_winner_first beyond what race FE already absorbs.

The log_sample_size coefficient itself

Outcome S0 No FE S1 Race FE S2 Firm FE S3 Firm+Race FE
Calls winner first −0.060** (0.030) −0.016 (0.037) −0.120*** (0.034) −0.025 (0.041)
Margin error −6.410*** (1.478) −1.311 (0.899) −7.594*** (1.366) −0.324 (1.068)
Mean |error| −0.797** (0.348) −0.531 (0.449) −0.272 (0.400) −0.599 (0.543)

Sanity check on sign: log_sample_size is mostly negative on all three outcomes — bigger samples are more accurate (lower margin error, lower mean |error|).

The surprise: log_sample_size is NEGATIVE on poll_calls_winner_first under S0 and S2. Reading: larger polls cluster in TIGHTER races where calling the winner is harder, so the raw cross-sectional and within-firm correlations are negative. Under S1 (race FE) and S3 (firm + race FE) this selection is absorbed and the coefficient becomes near-zero.

What this means for AN-090 and the paper

  1. The AN-090 spec ladder is robust to sample-size control. The headline findings — particularly the is_other_firm margin-error coefficient in S1 / S2 / S3 — change by less than 0.5 pp under any specification when log_sample is added.
  2. Sample size is mostly a race-difficulty proxy in this data. It correlates strongly with margin_error / mean |error| / calls_winner_first cross-sectionally but is absorbed by race FE. The paper doesn't need a separate sample-size control alongside race FE; race FE does the work.
  3. The candidate raw mean-error effect (+1.07 pp under S0) is partly mechanical sample size — about 18% of it. The honest statement is "candidate-sponsored polls are 0.88 pp less accurate than media polls at average sample size." The race-FE version is null either way.
  4. The other_firm margin-error effect strengthens slightly under sample-size control (−3.01 → −3.77 in S0, −2.65 → −3.06 in S2). With sample size held constant, other_firm shells look even more anomalous. This is consistent with AN-085's finding that the shell signal is about who pays, not about poll quality.

Recommendation

Add log_sample_size to the headline regression specs as a matter of conservative practice. It barely moves the coefficients under race FE but it sharpens raw-mean interpretations and gives a defensible answer to "did you control for sample size?" The paper appendix table (AN-090) should be updated to include the WITH-log_sample version as the primary, with a footnote that it changes nothing under race FE.

Caveats

Artifacts