id: an-122 hypothesis: shell-contratante headline: "AN-102's null finding on shell-β is robust to the 2.6× sample expansion. With shell-CNPJ list extended from AN-094's 14 (n=620 cand-poll rows) to AN-121's 89 (n=1,612 rows), shell-β on |error| remains statistically indistinguishable from zero across every FE spec: S0 +0.03 (0.34), S1 −0.39 (0.36), S2 −0.29 (0.31), S3 −0.46 (0.31). Point estimates drift slightly negative as the SE tightens, but the t-statistic stays under 2 everywhere. AN-102's interpretation — 'cover-vehicle polls look like media polls in their per-poll error distribution' — survives. Implication: the cover-vehicle category does its work at the IDENTIFICATION margin (we can't link them to a candidate), not at a measurement margin (they don't produce detectably noisier polls); this is the iceberg-framing claim, not an additional mechanism claim." type: robustness status: done status_date: 2026-06-21 confidence: yellow created: 2026-06-21 script: source/analysis/an-122-shell-bucket-expanded.py target: build/table/an-122-shell-bucket-expanded.csv cited_in: [] design: sample: cand_poll matched_share=1.0 panel, identical to AN-102. Sponsor buckets: media (reference) / candidate / pollster_self / other_firm (residual) / shell. Shell list source: AN-121 universe extension (89 CNPJs / 1,612 cand-poll rows in the analysis sample, up from AN-094's 14 CNPJs / 620 rows). 10 of the 14 AN-094 shells are in the AN-121 list (the 4 missing were promoted to media/pollster_other by the CNAE upgrade in poll.py); the 79 new CNPJs are smaller other_firm operators that meet the n_polls ≥ 5 + no-media/pollster-token rule. specification: AN-102's spec verbatim — |error| at cand-poll level, OLS with bucket dummies + log_sample control, FE ladder (S0 no FE, S1 race FE, S2 race+cand FE, S3 race+cand+firm FE), cluster SE at muni. Side-by-side run on both shell lists in the same script. comparator: AN-102 (recorded shell-β null across all four specs) notes: "Reproduces AN-102's coefficients exactly when run with the AN-094 list (sanity check passes). The expanded sample tightens shell-β SEs by ~40% (0.57 → 0.31 in S3) without changing the substantive conclusion. Other_firm-residual β on |error| was already null in AN-102; remains null here. AN-102's positive other_firm-residual finding was on PREDICTED bias (5-fold-CV GBM), not |error| — this analysis does not refit that outcome."

AN-122: AN-102 headline tables with expanded shell bucket

Question

AN-102 fit |error| regressions with sponsor buckets media (reference) / candidate / pollster_self / other_firm-residual / shell, where the shell list was AN-094's 14 hand-coded PROBABLE_SHELL CNPJs (620 cand-poll rows in the analysis sample). The headline finding was that shell-β was null across every FE spec (S0 +0.65 / S1 −0.14 / S2 +0.06 / S3 −0.10 pp, every $p > 0.2$), interpreted as "AN-094 shells are professional operators producing media-pattern polls to evade detection." The interpretation rested on n=620.

AN-121's universe extension adds 75 more shell CNPJs (89 total, 1,612 cand-poll rows in the analysis sample — 2.6× the AN-094 sample). This analysis refits AN-102's headline spec ladder with the expanded list and asks: does shell-β stay null at the larger sample, or does the 2.6× expansion bring in less-sophisticated cover vehicles that the |error| spec catches?

Design

source/analysis/an-122-shell-bucket-expanded.py:

Reuse AN-102's panel builder and fit helpers verbatim.
Load two shell-CNPJ lists:
- AN-094: 14 hand-coded PROBABLE_SHELL CNPJs from build/intermediate/other_firm_top25.csv.
- AN-121: 89 rule-extended shell CNPJs from build/table/an-121-iceberg-universe/shell-cnpj-list.csv. Intersection is 10 CNPJs; 4 AN-094 CNPJs are promoted to media/pollster_other by the CNAE upgrade in poll.py and don't appear in the AN-121 list; 79 new CNPJs are in AN-121 only.
Build the AN-102 panel twice (once per shell list); other_firm protocols touched by any shell CNPJ in the active list are promoted to the shell bucket; the remaining other_firm protocols stay in the residual bucket.
Fit AN-102's spec ladder (S0–S3) on each panel; report side-by-side bucket coefficients.

Results

Table: shell-β side-by-side (|error| at cand-poll level)

Spec	AN-094 (n=620)	AN-121 (n=1,612)
S0 (no FE)	+0.65 (0.53)	+0.03 (0.34)
S1 (race FE)	−0.14 (0.55)	−0.39 (0.36)
S2 (race + cand FE)	+0.06 (0.53)	−0.29 (0.31)
S3 (race + cand + firm FE)	−0.10 (0.57)	−0.46 (0.31)

Every coefficient has $p > 0.14$.

Table: other_firm-residual β side-by-side

Spec	AN-094 (n_resid=2,542)	AN-121 (n_resid=1,601)
S0 (no FE)	+0.03 (0.28)	+0.27 (0.34)
S1 (race FE)	−0.32 (0.32)	−0.10 (0.39)
S2 (race + cand FE)	−0.32 (0.28)	−0.13 (0.37)
S3 (race + cand + firm FE)	−0.28 (0.29)	−0.17 (0.35)

Every coefficient has $p > 0.25$.

(Other bucket coefficients — candidate, pollster_self — are essentially unchanged across the two shell-list versions; see build/table/an-122-shell-bucket-expanded.tex for the full four- bucket grid.)

Sanity check

Running the script with the AN-094 list reproduces AN-102's recorded coefficients to three decimals across all four specs and all five bucket coefficients. The fit helpers from AN-102 are imported directly, so this is a sanity check on the panel-builder plus the data state, not the regression code.

Interpretation

Shell-β stays null under sample expansion. The 2.6× sample expansion (620 → 1,612 cand-poll rows) tightens shell-β standard errors by ~40% (S3: 0.57 → 0.31) but does not pull the point estimates away from zero. The t-statistic on shell stays under 2 in every spec; the 95% CI on the most negative point estimate (S3, AN-121) is $[-1.07, +0.15]$, ruling out a +0.6 pp degradation of the sponsor-row |error| but not ruling out smaller effects.

Reading. The cover-vehicle category, as defined by AN-121's rule, does not produce polls that look noisier than media polls on the per-cand-poll-row |error| outcome. AN-102's "professional shells produce media-pattern polls" interpretation survives the sample expansion. The 75 new CNPJs the AN-121 rule pulled in are not less-sophisticated operators showing measurable |error| degradation — they look indistinguishable from media polls at this margin.

What this does not test. The |error| outcome is a per-row absolute deviation from final share. If a shell sponsors a poll that over-states one specific candidate by +7 pp, the sponsor-row |error| goes up; but distributed across 5 cand-poll rows in the poll, the average |error| only rises by ~1.4 pp. The S3 CI rules out +0.15 pp but not +1.4 pp, so the test is underpowered for a sponsor-row-equivalent effect averaged over five rows. The identification of "sponsor's candidate" is itself unavailable for shell polls (that's the cover-vehicle problem), so we cannot run the same per-sponsor-row test that gives the +7 pp headline on candidate-linked polls. The null here is therefore consistent with EITHER (i) shells aren't slanting in measurable ways, OR (ii) they are slanting but we can't measure it without sponsor identification. The iceberg-framing claim does not depend on distinguishing these two — both readings imply the cover-vehicle category does its work at the identification margin.

Other_firm-residual β was also null on |error|. AN-102's positive other_firm-residual finding came from the PREDICTED bias (GBM, OOS) outcome, not from |error|. This analysis does not refit the GBM outcome. The |error| null on other_firm-residual is unchanged between AN-094 and AN-121 versions.

Confidence rationale (yellow). The reproduction-from-AN-102 sanity check passes to three decimals, so the panel-builder + fit helpers are doing the same thing. The null finding on shell-β is robust to sample expansion. What keeps the badge from green: (i) the |error| outcome is structurally underpowered for a per-sponsor-row mechanism, so the null doesn't prove shells aren't slanting; (ii) the AN-102 GBM-predicted-bias outcome is the more sensitive test and isn't rerun here; (iii) the AN-121 shell list itself is a precision-favoring floor (AN-094 calibration showed 69% recall), so the "less-sophisticated operators" in the 75 new CNPJs may themselves be a precision-tilted subset. Green would require re-running the GBM-predicted-bias spec on the AN-121 list, or a direct sponsor-row-level test on shell polls (which requires solving the identification problem the cover-vehicle category creates by definition).

Follow-ups

Re-run AN-102's GBM-predicted-bias spec with the AN-121 shell list (extension, high gain ~30 min). AN-102's interesting finding was that other_firm-residual lit up the GBM-predicted- bias outcome (+0.024 raw, p<0.01) while shell did not, suggesting the within-poll fingerprint of slant lives in the unaudited residual. With the AN-121 expansion the residual shrinks from 2,542 → 1,601 rows (the rule pulled high-volume CNPJs out into the shell bucket). Re-running the GBM-predicted-bias spec tells us whether the predicted-bias signal moved from residual into the expanded shell bucket (= AN-121 rule caught the slant signal) or stays in residual (= GBM-predicted-bias signal is not driven by volume, the rule didn't help). Suggested script: an-NNN-shell-bucket-expanded-gbm.py.
Add a sentence to the iceberg appendix (writing, ~5 min). The appendix section currently states the iceberg-framing claim is at the identification margin; AN-122 supports this with a robust null on |error| at 2.6× sample. One sentence in the "Construction — Shell CNPJ" paragraph would close the loop: "The expanded shell bucket does not produce polls measurably noisier than media polls on the |error| outcome at the per-cand-poll-row level (AN-122; null across every FE spec at n=1,612), consistent with the iceberg-framing claim that cover-vehicle activity does its work at the identification margin rather than the per-poll noise-floor margin."
Power analysis: what sponsor-row effect would the |error| spec detect on shell polls? (blind spot, ~30 min). Bound the null. If the S3 CI on shell-β at n=1,612 rules out a per- sponsor-row effect of size X, that's an upper bound on the shell-bucket mechanism magnitude under the within-poll-tilt model. Useful for the paper's caveats section. Suggested approach: simulate a +7 pp tilt on one row per shell poll + refit; check at what tilt-size the spec detects significance.