id: an-092 hypothesis: shell-contratante headline: Adding log_sample_size as a control barely moves the sponsor-bucket coefficients under any FE-controlled spec (S1, S2, S3). The largest shift is under S0 (no FE) where log_sample_size soaks up race-level difficulty composition. Race FE absorbs ~95% of the sample-size variation. Net: the AN-090 spec ladder is robust to sample-size control. log_sample_size coefficient is −6.4 pp/log-unit on margin_error in S0 (p<0.001) but only −1.3 (ns) in S1 (race FE), confirming that sample size is mostly proxying for race difficulty (big races attract big polls). type: robustness question: "Do the AN-090 sponsor-bucket coefficients survive controlling for sample size?" tags: ["hyp:shell-contratante", "robustness", "sample-size", "spec-ladder"] status: interpreted status_date: 2026-06-17 confidence: green created: 2026-06-17 script: source/analysis/an-092-spec-ladder-with-sample-size.py target: build/table/an-092-spec-ladder-with-sample-size.csv
AN-092: Sample-size robustness of the AN-090 spec ladder
User flagged (2026-06-17) that AN-082→091 don't control for the
declared sample size (QT_ENTREVISTADO). Polling theory says
SE ∝ 1/√n, so larger samples should be more accurate
mechanically. Candidate-linked polls have systematically smaller
samples (median 360-400 vs 408+ for media), so part of any
"candidate-sponsored polls are less accurate" finding could be
mechanical.
Sample sizes by bucket
| Bucket | n | median sample | mean sample |
|---|---|---|---|
| pollster_self | 1,464 | 451 | 540 |
| media | 2,948 | 408 | 540 |
| party | 61 | 400 | 474 |
| other_firm | 1,553 | 400 | 470 |
| party_name | 91 | 326 | 453 |
| committee_prefeito | 339 | 360 | 394 |
| individual (CPF) | 508 | 393 | 401 |
Candidate-linked buckets (CPF / committee / party-name) all sit below the media / pollster_self median. The confound is real.
Spec ladder comparison
For each (outcome × bucket × spec), shows the coefficient WITHOUT vs WITH log_sample_size as a continuous control. SE in parens, cluster-robust at race. * p<0.10, ** p<0.05, *** p<0.01.
Margin error (the most-robust outcome from AN-090)
| Bucket | Spec | β WITHOUT log_sample | β WITH log_sample | Δ |
|---|---|---|---|---|
| is_candidate | S0 No FE | +0.018 (1.075) | −1.620* (0.976) | −1.64 |
| is_candidate | S1 Race FE | −2.572*** (0.932) | −2.569*** (0.932) | +0.003 |
| is_candidate | S2 Firm FE | +1.008 (1.069) | −0.093 (1.048) | −1.10 |
| is_candidate | S3 Firm+Race FE | −1.755* (0.974) | −1.754* (0.973) | +0.000 |
| is_pollster_self | S0 No FE | −0.306 (0.898) | −0.542 (0.866) | −0.24 |
| is_pollster_self | S1 Race FE | −0.841 (0.612) | −0.871 (0.613) | −0.03 |
| is_pollster_self | S2 Firm FE | +1.372 (1.074) | +1.545 (1.035) | +0.17 |
| is_pollster_self | S3 Firm+Race FE | +0.223 (0.776) | +0.232 (0.774) | +0.01 |
| is_other_firm | S0 No FE | −3.009*** (0.841) | −3.768*** (0.788) | −0.76 |
| is_other_firm | S1 Race FE | −1.867*** (0.618) | −1.884*** (0.620) | −0.02 |
| is_other_firm | S2 Firm FE | −2.645*** (0.899) | −3.058*** (0.886) | −0.41 |
| is_other_firm | S3 Firm+Race FE | −0.905 (0.617) | −0.909 (0.617) | −0.00 |
Reading: under race FE (S1) or firm + race FE (S3), the sample-size control changes the sponsor coefficients by less than 0.04 pp. The big shifts (Δ from −0.4 to −1.6) are all under specs that DON'T include race FE (S0, S2). Race FE absorbs 95+% of the sample-size variation because tight races attract big polls (sample size is endogenous to race difficulty).
Mean |error|
| Bucket | Spec | β WITHOUT log_sample | β WITH log_sample | Δ |
|---|---|---|---|---|
| is_candidate | S0 No FE | +1.072*** (0.382) | +0.883** (0.389) | −0.19 |
| is_candidate | S1 Race FE | +0.270 (0.460) | +0.275 (0.459) | +0.01 |
| is_candidate | S3 Firm+Race FE | +0.110 (0.444) | +0.114 (0.445) | +0.01 |
| is_other_firm | S0 No FE | +0.029 (0.276) | −0.062 (0.277) | −0.09 |
| is_other_firm | S3 Firm+Race FE | +0.127 (0.288) | +0.119 (0.288) | −0.01 |
Same story: small shifts under FE specs, slightly larger under raw means. The candidate raw mean-error advantage of +1.07 pp shrinks to +0.88 pp once sample size is controlled — about 18% of the raw effect was the candidate-poll-sample-size-is-smaller mechanical confound. Under race FE the effect is null either way.
Calls winner first
| Bucket | Spec | β WITHOUT log_sample | β WITH log_sample | Δ |
|---|---|---|---|---|
| is_candidate | S0 No FE | −0.007 (0.021) | −0.021 (0.019) | −0.014 |
| is_candidate | S1 Race FE | +0.027 (0.032) | +0.027 (0.032) | +0.000 |
| is_candidate | S3 Firm+Race FE | +0.057 (0.035) | +0.057* (0.035) | +0.000 |
| is_other_firm | S0 No FE | −0.039** (0.018) | −0.045** (0.018) | −0.007 |
| is_other_firm | S3 Firm+Race FE | +0.010 (0.025) | +0.009 (0.024) | +0.000 |
Trivial shifts at every spec. Sample size barely affects calls_winner_first beyond what race FE already absorbs.
The log_sample_size coefficient itself
| Outcome | S0 No FE | S1 Race FE | S2 Firm FE | S3 Firm+Race FE |
|---|---|---|---|---|
| Calls winner first | −0.060** (0.030) | −0.016 (0.037) | −0.120*** (0.034) | −0.025 (0.041) |
| Margin error | −6.410*** (1.478) | −1.311 (0.899) | −7.594*** (1.366) | −0.324 (1.068) |
| Mean |error| | −0.797** (0.348) | −0.531 (0.449) | −0.272 (0.400) | −0.599 (0.543) |
Sanity check on sign: log_sample_size is mostly negative on all three outcomes — bigger samples are more accurate (lower margin error, lower mean |error|).
The surprise: log_sample_size is NEGATIVE on poll_calls_winner_first under S0 and S2. Reading: larger polls cluster in TIGHTER races where calling the winner is harder, so the raw cross-sectional and within-firm correlations are negative. Under S1 (race FE) and S3 (firm + race FE) this selection is absorbed and the coefficient becomes near-zero.
What this means for AN-090 and the paper
- The AN-090 spec ladder is robust to sample-size control.
The headline findings — particularly the
is_other_firmmargin-error coefficient in S1 / S2 / S3 — change by less than 0.5 pp under any specification when log_sample is added. - Sample size is mostly a race-difficulty proxy in this data. It correlates strongly with margin_error / mean |error| / calls_winner_first cross-sectionally but is absorbed by race FE. The paper doesn't need a separate sample-size control alongside race FE; race FE does the work.
- The candidate raw mean-error effect (+1.07 pp under S0) is partly mechanical sample size — about 18% of it. The honest statement is "candidate-sponsored polls are 0.88 pp less accurate than media polls at average sample size." The race-FE version is null either way.
- The other_firm margin-error effect strengthens slightly under sample-size control (−3.01 → −3.77 in S0, −2.65 → −3.06 in S2). With sample size held constant, other_firm shells look even more anomalous. This is consistent with AN-085's finding that the shell signal is about who pays, not about poll quality.
Recommendation
Add log_sample_size to the headline regression specs as a matter of conservative practice. It barely moves the coefficients under race FE but it sharpens raw-mean interpretations and gives a defensible answer to "did you control for sample size?" The paper appendix table (AN-090) should be updated to include the WITH-log_sample version as the primary, with a footnote that it changes nothing under race FE.
Caveats
sample_sizeis the DECLARED sample size from the TSE registry, not the actual fielded sample. Operational deviations (the AN-080 finding) mean a firm that registers n=400 might field fewer / different respondents. We use the declared value because it's all we have.- The negative log_sample on calls_winner_first is a selection artifact, not a real precision effect. Don't cite the S0/S2 coefficient as evidence that larger polls are less accurate — they're not, they just go to tighter races.
- Adding log_sample to S3 (firm + race FE) absorbs even more sample variation but gains little: race FE within race already captures most of the relevant variation. log_sample in S3 is near-zero (−0.03 on calls_winner_first; −0.32 on margin_error; −0.60 on mean |error|) and never significant.
Artifacts
- Script:
source/analysis/an-092-spec-ladder-with-sample-size.py - CSV (full coefficient matrix):
build/table/an-092-spec-ladder-with-sample-size.csv - Markdown:
build/table/an-092-spec-ladder-with-sample-size.md - Headline JSON:
build/table/an-092-spec-ladder-with-sample-size.json