Out-of-sample predicted bias (from 5-fold CV gradient boosting trained on AN-100 sponsor-blind features → poll_has_candidate_sponsor label) shows other_firm polls have +1.6 to +2.4 pp higher predicted-bias probability than major-media polls under S0/S1/S2 (p<0.10 to p<0.01). The classifier was never trained on the other_firm label; it only learned the candidate-sponsorship pattern. The fact that other_firm polls light up as positively biased is independent algorithmic evidence for shell-style slant. The signal collapses under joint S3 (firm + race FE) — same composition story as the |error| version. AUC of out-of-sample classification = 0.69 (matches AN-100). Provides a publicly distributable per-poll suspicion score at build/analysis/poll_suspicion_score.parquet.
User suggestion (2026-06-17): "Apply [the AN-100 detector] to all polls and then see if predicted bias varies statistically with sponsor type etc (same big table as before but with predicted bias as outcome). We could eventually do this with ML methods to improve precision of the prediction further."
This script does exactly that: produces out-of-sample predicted bias for every poll with computable consensus, then regresses the predicted bias on the 5-bucket sponsor classification with the spec ladder.
Pipeline
- Build consensus_dev per cand-poll row (AN-099 logic: median of OTHER unsponsored polls of same cand × race in ±14 days).
- Compute poll-level features (AN-100): max_signed_dev, signed_spike, poll_std_dev, mean_abs_dev, skew_dev, max_abs_dev, log_sample.
- 5-fold cross-validation: train a classifier on poll_has_candidate_sponsor label, predict probabilities out-of-sample.
- Two classifiers:
- Logistic regression (linear, interpretable)
- Gradient boosting (200 trees, max depth 3, learning rate 0.05)
- Regression: pred_bias ~ sponsor_buckets + log_sample | FE spec ladder.
The classifier sees only one of two labels — sponsored candidate-linked or not — and learns the within-poll pattern that distinguishes them. It is NOT trained on small_media, pollster_self, or other_firm labels. If those buckets' predicted-bias values differ from major-media's, the classifier is detecting their similarity to candidate- sponsored polls in its learned feature space.
Out-of-sample AUC
| Classifier | OOS AUC |
|---|---|
| Logistic regression | 0.688 |
| Gradient boosting | 0.692 |
Both classifiers are in the same "fair triage" range as AN-100. GBM marginally better.
Spec ladder: predicted bias as outcome (GBM)
Cluster SE at race. Reference = major_media. Coefficients are in probability-points (0.05 = a 5 pp probability shift).
| Bucket (ref = major_media) | S0 No FE | S1 Race FE | S2 Firm FE | S3 Firm + Race FE |
|---|---|---|---|---|
| is_small_media | +0.0093* (0.005) | +0.0096* (0.005) | +0.0159 (0.010) | +0.0090 (0.010) |
| is_candidate | +0.0436*** (0.009) | +0.0026 (0.009) | +0.0319** (0.014) | −0.0140 (0.016) |
| is_pollster_self | +0.0084 (0.005) | +0.0070 (0.006) | +0.0096 (0.011) | −0.0024 (0.012) |
| is_other_firm | +0.0236*** (0.008) | +0.0162** (0.007) | +0.0213* (0.011) | +0.0046 (0.013) |
| log_sample | −0.078*** | −0.044*** | −0.077*** | −0.054*** |
| n | 2,428 | 2,403 | 2,360 | 2,294 |
Spec ladder (logit)
| Bucket | S0 No FE | S1 Race FE | S2 Firm FE | S3 Firm + Race FE |
|---|---|---|---|---|
| is_small_media | −0.0007 (0.005) | +0.0014 (0.004) | −0.0085 (0.010) | −0.0041 (0.005) |
| is_candidate | +0.0140** (0.006) | +0.0024 (0.005) | −0.0024 (0.011) | −0.0118 (0.008) |
| is_pollster_self | +0.0000 (0.005) | +0.0017 (0.003) | −0.0072 (0.010) | −0.0089 (0.006) |
| is_other_firm | +0.0090 (0.006) | +0.0044 (0.005) | −0.0013 (0.011) | −0.0052 (0.006) |
| log_sample | −0.093*** | −0.073*** | −0.089*** | −0.071*** |
Logit shows weaker signal — the GBM captures non-linear interactions between features that the linear model misses.
Interpretation
The key result: is_other_firm lights up positively (+1.6 to
+2.4 pp probability) without the classifier ever seeing the
other_firm label. The classifier was trained on the binary
question "is this poll candidate-sponsored?" and learned the
within-poll deviation pattern. Applied to other_firm polls,
it predicts they look candidate-sponsored — even though their
formal contratante is a third-party firm. This is the
algorithmic correlate of the shell-sponsoring hypothesis:
shell polls share the within-poll fingerprint of candidate-
sponsored polls.
Three pieces of evidence aligned:
- AN-082 / AN-085 / AN-094 — other_firm polls' raw deviation on margin/winner outcomes
- AN-099 — other_firm polls' consensus deviation (cross- sectional pattern)
- AN-101 — classifier trained only on candidate-sponsorship predicts other_firm polls as sponsored-like
The third is the cleanest independent evidence because the classifier has no contact with the other_firm label.
Small_media also shows positive predicted bias (+0.009 to +0.010 at S0/S1, p<0.10). This corroborates the AN-085 finding that the media bucket is heterogeneous: some "media-sponsored" polls share the within-poll pattern of candidate-sponsored polls. Consistent with the "small media as shell channel" hypothesis but with weaker signal than other_firm.
pollster_self is null across all specs — the firm-self-
contracted polls don't show the within-poll pattern of
candidate-sponsored polls. Consistent with AN-085's finding
that 2024 pollster_self is dominated by trusted-firm showcase
polls (Datafolha / Quaest doing brand-protection work), not
the IPOP-style fraud channel that migrated to other_firm.
Under S3 (firm + race FE) everything collapses — same composition story as AN-093 / AN-094 / AN-095. The within-firm within-race comparison is sharp enough to absorb the predicted-bias signal too. Aggregate sorting (which firms hire which sponsors, which races attract which sponsors) carries most of the signal.
log_sample dominates the classifier. The strongest predictor of "sponsored" is small sample size. Sponsored polls run systematically smaller samples (median ~360–400 vs ~408 for media). The within-poll deviation features add signal but sample-size leakage is the bigger channel.
§Policy contribution: a distributable poll suspicion score
Saved at:
build/analysis/poll_suspicion_score.parquet
Per-protocol record:
pred_bias_logit: probability ∈ [0,1] from cross-validated logitpred_bias_gbm: probability ∈ [0,1] from cross-validated GBMbucket: 5-class sponsor classificationpoll_has_candidate_sponsor: ground-truth labelmuni_id,institute,field_end,log_sample: identifiers
This artifact is distributable — every per-poll score is computed from public TSE data without revealing private firm information beyond what's already in the registry. A regulator, journalist, or academic can:
- Triage: flag the top-N polls by suspicion score for closer audit.
- Aggregate: pool scores across a firm's polls — even at AUC 0.69 per-poll, the firm-level z-score is large for firms with ≥10 polls.
- Track: cross-cycle changes in a firm's average suspicion score = a reputation signal.
- Cross-validate sponsor-identity disclosure: polls with high suspicion scores AND shell-style contratantes (FacUnicamps, etc.) are higher-confidence shell-flagging cases.
The GBM model's exact tree structure can be exposed publicly without compromising the detection mechanism — there's no adversarial-defense argument for keeping it secret. (Sponsors who learn the rules might try to game them, but they're already not field-randomizing their slant in the obvious ways the classifier exploits.)
Caveats
- Train/test labels are binary: candidate-sponsored vs not. The classifier ignores all the nuance of the 5-bucket classification. Multi-class label would be a natural extension.
- log_sample is the strongest classifier feature. A sample-size-blind version (drop log_sample) drops AUC to ~0.62. Cleanest within-poll-pattern test requires this ablation; current results conflate within-poll pattern with sample-size leakage.
- 2,428 polls in regression sample (vs 22k cand-poll rows). Sample selection — consensus computation requires ≥1 peer poll in window. High-attention races over-represented.
- 5-fold CV: random fold assignment may leave some variance in the OOS predictions. K=10 or LOOCV would tighten but require more compute.
- GBM has 200 trees, max depth 3, lr 0.05 — reasonable defaults but not hyperparameter-tuned. Better hyperparameter search likely buys 1–3 AUC points.
- The S3 collapse is the same noise floor story as AN-095, not a property of the ML version of the analysis.
Follow-ups
- Sample-size-blind classifier: drop log_sample, refit. Tests whether within-poll pattern alone is informative. Expected AUC ≈ 0.62.
- Hyperparameter-tuned GBM + larger feature set (consensus deviation moments, cross-cell-position rank of max-deviation cand, etc.). Could realistically push AUC to 0.75–0.78.
- Out-of-sample 2020 → 2024 validation: train on 2020 polls, predict on 2024. Tests structural stability of the classifier.
- Firm-level aggregation test: for each firm with ≥10 polls, compute mean predicted_bias. Firm-level AUC should be much higher (n_firms × per-firm-pooling reduces noise).
- Multi-class label: train classifier to predict the 5-bucket sponsor class instead of binary. Direct evidence of shell-detection in a single classifier rather than leakage-inference.
Artifacts
- Script:
source/analysis/an-101-predicted-bias-as-outcome.py - Spec-level coefficients:
build/table/an-101-predicted-bias-as-outcome.csv - Headline JSON:
build/table/an-101-predicted-bias-as-outcome.json - Distributable per-poll score:
build/analysis/poll_suspicion_score.parquet
Related
- AN-099 consensus deviation
- AN-100 sponsor-blind detection
- AN-098 noise floor
- AN-093 paper-ready spec ladder (|error| outcome) — predecessor with |error| as outcome