Out-of-sample predicted bias (from 5-fold CV gradient boosting trained on AN-100 sponsor-blind features → poll_has_candidate_sponsor label) shows other_firm polls have +1.6 to +2.4 pp higher predicted-bias probability than major-media polls under S0/S1/S2 (p<0.10 to p<0.01). The classifier was never trained on the other_firm label; it only learned the candidate-sponsorship pattern. The fact that other_firm polls light up as positively biased is independent algorithmic evidence for shell-style slant. The signal collapses under joint S3 (firm + race FE) — same composition story as the |error| version. AUC of out-of-sample classification = 0.69 (matches AN-100). Provides a publicly distributable per-poll suspicion score at build/analysis/poll_suspicion_score.parquet.

Confidence
green
Type
ml-policy-mechanism
Script
source/analysis/an-101-predicted-bias-as-outcome.py
Target
build/table/an-101-predicted-bias-as-outcome.csv
Status
interpreted · 2026-06-17
Created
2026-06-17

User suggestion (2026-06-17): "Apply [the AN-100 detector] to all polls and then see if predicted bias varies statistically with sponsor type etc (same big table as before but with predicted bias as outcome). We could eventually do this with ML methods to improve precision of the prediction further."

This script does exactly that: produces out-of-sample predicted bias for every poll with computable consensus, then regresses the predicted bias on the 5-bucket sponsor classification with the spec ladder.

Pipeline

  1. Build consensus_dev per cand-poll row (AN-099 logic: median of OTHER unsponsored polls of same cand × race in ±14 days).
  2. Compute poll-level features (AN-100): max_signed_dev, signed_spike, poll_std_dev, mean_abs_dev, skew_dev, max_abs_dev, log_sample.
  3. 5-fold cross-validation: train a classifier on poll_has_candidate_sponsor label, predict probabilities out-of-sample.
  4. Two classifiers:
    • Logistic regression (linear, interpretable)
    • Gradient boosting (200 trees, max depth 3, learning rate 0.05)
  5. Regression: pred_bias ~ sponsor_buckets + log_sample | FE spec ladder.

The classifier sees only one of two labels — sponsored candidate-linked or not — and learns the within-poll pattern that distinguishes them. It is NOT trained on small_media, pollster_self, or other_firm labels. If those buckets' predicted-bias values differ from major-media's, the classifier is detecting their similarity to candidate- sponsored polls in its learned feature space.

Out-of-sample AUC

Classifier OOS AUC
Logistic regression 0.688
Gradient boosting 0.692

Both classifiers are in the same "fair triage" range as AN-100. GBM marginally better.

Spec ladder: predicted bias as outcome (GBM)

Cluster SE at race. Reference = major_media. Coefficients are in probability-points (0.05 = a 5 pp probability shift).

Bucket (ref = major_media) S0 No FE S1 Race FE S2 Firm FE S3 Firm + Race FE
is_small_media +0.0093* (0.005) +0.0096* (0.005) +0.0159 (0.010) +0.0090 (0.010)
is_candidate +0.0436*** (0.009) +0.0026 (0.009) +0.0319** (0.014) −0.0140 (0.016)
is_pollster_self +0.0084 (0.005) +0.0070 (0.006) +0.0096 (0.011) −0.0024 (0.012)
is_other_firm +0.0236*** (0.008) +0.0162** (0.007) +0.0213* (0.011) +0.0046 (0.013)
log_sample −0.078*** −0.044*** −0.077*** −0.054***
n 2,428 2,403 2,360 2,294

Spec ladder (logit)

Bucket S0 No FE S1 Race FE S2 Firm FE S3 Firm + Race FE
is_small_media −0.0007 (0.005) +0.0014 (0.004) −0.0085 (0.010) −0.0041 (0.005)
is_candidate +0.0140** (0.006) +0.0024 (0.005) −0.0024 (0.011) −0.0118 (0.008)
is_pollster_self +0.0000 (0.005) +0.0017 (0.003) −0.0072 (0.010) −0.0089 (0.006)
is_other_firm +0.0090 (0.006) +0.0044 (0.005) −0.0013 (0.011) −0.0052 (0.006)
log_sample −0.093*** −0.073*** −0.089*** −0.071***

Logit shows weaker signal — the GBM captures non-linear interactions between features that the linear model misses.

Interpretation

The key result: is_other_firm lights up positively (+1.6 to +2.4 pp probability) without the classifier ever seeing the other_firm label. The classifier was trained on the binary question "is this poll candidate-sponsored?" and learned the within-poll deviation pattern. Applied to other_firm polls, it predicts they look candidate-sponsored — even though their formal contratante is a third-party firm. This is the algorithmic correlate of the shell-sponsoring hypothesis: shell polls share the within-poll fingerprint of candidate- sponsored polls.

Three pieces of evidence aligned:

  1. AN-082 / AN-085 / AN-094 — other_firm polls' raw deviation on margin/winner outcomes
  2. AN-099 — other_firm polls' consensus deviation (cross- sectional pattern)
  3. AN-101 — classifier trained only on candidate-sponsorship predicts other_firm polls as sponsored-like

The third is the cleanest independent evidence because the classifier has no contact with the other_firm label.

Small_media also shows positive predicted bias (+0.009 to +0.010 at S0/S1, p<0.10). This corroborates the AN-085 finding that the media bucket is heterogeneous: some "media-sponsored" polls share the within-poll pattern of candidate-sponsored polls. Consistent with the "small media as shell channel" hypothesis but with weaker signal than other_firm.

pollster_self is null across all specs — the firm-self- contracted polls don't show the within-poll pattern of candidate-sponsored polls. Consistent with AN-085's finding that 2024 pollster_self is dominated by trusted-firm showcase polls (Datafolha / Quaest doing brand-protection work), not the IPOP-style fraud channel that migrated to other_firm.

Under S3 (firm + race FE) everything collapses — same composition story as AN-093 / AN-094 / AN-095. The within-firm within-race comparison is sharp enough to absorb the predicted-bias signal too. Aggregate sorting (which firms hire which sponsors, which races attract which sponsors) carries most of the signal.

log_sample dominates the classifier. The strongest predictor of "sponsored" is small sample size. Sponsored polls run systematically smaller samples (median ~360–400 vs ~408 for media). The within-poll deviation features add signal but sample-size leakage is the bigger channel.

§Policy contribution: a distributable poll suspicion score

Saved at: build/analysis/poll_suspicion_score.parquet

Per-protocol record:

This artifact is distributable — every per-poll score is computed from public TSE data without revealing private firm information beyond what's already in the registry. A regulator, journalist, or academic can:

  1. Triage: flag the top-N polls by suspicion score for closer audit.
  2. Aggregate: pool scores across a firm's polls — even at AUC 0.69 per-poll, the firm-level z-score is large for firms with ≥10 polls.
  3. Track: cross-cycle changes in a firm's average suspicion score = a reputation signal.
  4. Cross-validate sponsor-identity disclosure: polls with high suspicion scores AND shell-style contratantes (FacUnicamps, etc.) are higher-confidence shell-flagging cases.

The GBM model's exact tree structure can be exposed publicly without compromising the detection mechanism — there's no adversarial-defense argument for keeping it secret. (Sponsors who learn the rules might try to game them, but they're already not field-randomizing their slant in the obvious ways the classifier exploits.)

Caveats

Follow-ups

  1. Sample-size-blind classifier: drop log_sample, refit. Tests whether within-poll pattern alone is informative. Expected AUC ≈ 0.62.
  2. Hyperparameter-tuned GBM + larger feature set (consensus deviation moments, cross-cell-position rank of max-deviation cand, etc.). Could realistically push AUC to 0.75–0.78.
  3. Out-of-sample 2020 → 2024 validation: train on 2020 polls, predict on 2024. Tests structural stability of the classifier.
  4. Firm-level aggregation test: for each firm with ≥10 polls, compute mean predicted_bias. Firm-level AUC should be much higher (n_firms × per-firm-pooling reduces noise).
  5. Multi-class label: train classifier to predict the 5-bucket sponsor class instead of binary. Direct evidence of shell-detection in a single classifier rather than leakage-inference.

Artifacts