Strict-blind detection ceiling (no firm identity, no state, no municipality, no race aggregates — only poll results + election results + peer polls + dates + log_sample) achieves OOS AUC 0.742 with XGBoost on 28 features. Firm identity adds 0.17 AUC (AN-103's 0.911 ceiling). The strict-blind classifier confirms two structural findings: (i) is_other_firm survives joint firm + race FE on the spec ladder (+0.013, p<0.05) — the unaudited shell-like tail robustly shows a sponsored-poll fingerprint that does NOT rely on knowing the firm; (ii) is_shell is significantly NEGATIVE at S0 (−0.055\*\*\*) and S1 (−0.023\*\*\*) — the AN-094 identified shells defeat blind detection regardless of classifier sophistication, hardening the 'professional evasion' interpretation. Top features: log_sample, n_peers, poll_std_dev, max_abs_dev, days_to_election_sq — consensus-deviation features collectively dominate.

Confidence
green
Type
ml-ablation
Script
source/analysis/an-104-strict-blind-detection.py
Target
build/table/an-104-strict-blind-detection.csv
Status
interpreted · 2026-06-17
Created
2026-06-17

User direction (2026-06-17): build a detector using only poll results, election results, other polls' results, and dates of polling. NOT pollster identity, NOT state, NOT municipality.

Strict-blind feature set (28 features)

Family Count Features
A. Error vs final results 8 mean_abs_error, max_abs_error, max_signed_error, min_signed_error, error_std, error_skew, error_concentration, n_cands_miss_5pp
B. Consensus deviation vs peer polls 10 max_signed_dev, max_abs_dev, signed_spike, abs_spike, poll_std_dev, mean_abs_dev, skew_dev, max_dev_rank, n_peers, peer_std
C. Within-poll distribution shape 7 n_cand_rows, herfindahl_shares, top1_share, top2_margin_poll, n_cands_at_zero, share_std, share_skew
D. Poll-intrinsic metadata 3 log_sample, days_to_election, days_to_election_sq

Dropped vs AN-103 (9 features): firm_share_candidate_work, firm_has_beta_estimate, firm_total_polls, firm_n_ufs, firm_mean_sample, firm_abs_beta, race_n_polls, race_week_n_polls, n_candidates_in_race.

Model comparison

Y1: poll_has_candidate_sponsor (n=6,710, 15.3% positive)

Model AUC Log-loss Avg Precision
Logistic L2 0.685 0.404 0.248
Random Forest 0.730 0.383 0.326
XGBoost 0.742 0.380 0.340
LightGBM 0.736 0.396 0.327

Y2: cand-sponsored OR shell-touched (n=6,710, 18.7% positive)

Model AUC Log-loss Avg Precision
Logistic L2 0.695 0.450 0.307
Random Forest 0.716 0.434 0.380
XGBoost 0.732 0.425 0.417
LightGBM 0.723 0.442 0.404

XGBoost wins both targets at 0.742 / 0.732. The gradient boosting advantage over logistic regression is bigger here (~5-6 AUC points) than in AN-103 (~1 point) — non-linear interactions among within-poll features carry more of the signal when firm aggregates are removed.

Detection AUC progression — the honest range

Analysis Features Method AUC
AN-100 / 101 (sponsor-blind 7 features) within-poll consensus + log_sample Logit / GBM 0.69
AN-104 strict-blind (28 features) + error vs final + within-poll distribution XGBoost 0.742
AN-103 full (37 features) + firm + race aggregates LightGBM 0.911

For the §Policy story, 0.74 is the honest "what public data alone achieves" number. The 0.17 gap to AN-103's 0.91 is what firm-identity adds — useful for adversarial / cumulative audit but cannot be deployed via public poll registry alone.

Top features (strict-blind, XGBoost)

Rank Feature Importance
1 log_sample 0.072
2 n_peers 0.056
3 poll_std_dev 0.052
4 max_abs_dev 0.045
5 days_to_election_sq 0.045
6 mean_abs_dev 0.044
7 min_signed_error 0.042
8 max_signed_error 0.042
9 signed_spike 0.039
10 max_signed_dev 0.035

Top 5 collectively = 27%. Consensus-deviation features (n_peers, poll_std_dev, max_abs_dev, mean_abs_dev, signed_spike, max_signed_dev) account for 6 of the top 10 positions. The strict-blind classifier IS using genuine within-poll and cross-poll signals, not firm-identity proxies. log_sample is #1 but its 0.07 share is far below the AN-103 firm-aggregate top 2 (0.19 + 0.16 = 0.35).

Spec ladder with strict-blind XGBoost predictions

Reference = media. Cluster SE at race.

Bucket S0 No FE S1 Race FE S2 Firm FE S3 Firm + Race FE
is_candidate +0.090*** (0.006) +0.006 (0.007) +0.040*** (0.006) +0.004 (0.008)
is_pollster_self −0.007* (0.004) −0.003 (0.004) −0.006 (0.006) −0.012** (0.005)
is_other_firm +0.016*** (0.005) +0.006 (0.005) +0.007 (0.005) +0.013** (0.006)
is_shell −0.055*** (0.008) −0.023*** (0.008) −0.007 (0.009) −0.007 (0.009)
log_sample −0.136*** (0.005) −0.049*** (0.007) −0.141*** (0.007) −0.053*** (0.009)
n 6,710 5,481 6,645 5,384

Three findings sharper than the AN-103 version because the classifier doesn't have firm-aggregate features absorbing the firm dimension:

1. is_other_firm survives joint firm + race FE

+0.013, p<0.05 at S3. The strict-blind classifier, trained without any firm-aggregate features, still assigns higher predicted-bias to other_firm polls vs media polls — within the same firm × race. This is the strongest within-firm-within- race shell-signal we have produced in any analysis. It confirms that the within-poll fingerprint of slant in the unaudited other_firm tier is real and not driven by firm- specific composition effects.

2. is_shell is significantly NEGATIVE at S0 and S1

S0: −0.055***. S1 (race FE): −0.023***. Even a fully blind classifier built on within-poll signals identifies AN-094-audited shells as LESS sponsored-like than media polls. The shell architecture defeats blind detection across three detector regimes:

The "professional evasion" reading hardens. AN-094-audited shells achieve structural mimicry of media polls (typical sample sizes, distributed within-poll deviations, no L-shape spike) that no statistical detector — weak or strong, blind or with firm info — can crack. The CNPJ-side classification (AN-094) is the binding identification method.

3. is_candidate at S3 is null

S3: +0.004, ns. With a strict-blind classifier, within firm × race candidate-sponsored polls don't have a within-poll fingerprint distinct from media polls of the same firm × race. This is the cleanest empirical confirmation of the AN-098 noise floor argument: within (cand × race × firm), the natural variance of poll error is wide enough to absorb the +7 pp slant in |error| terms. The slant is real and precisely measurable (AN-096 +7 pp signed error) but doesn't translate to a statistically-detectable within-poll pattern shift.

Substantive synthesis for the paper §Policy

Three regimes:

Regime Features AUC Captures
Blind statistical (AN-104) Public poll data only 0.742 Unaudited shell-like tail + most candidate-sponsored polls
Firm-augmented statistical (AN-103) + firm aggregates 0.911 Above + firm-identity signal (requires firm-level pooling)
CNPJ-side audit (AN-094, other session) Sponsor CNPJ + CNAE + capital qualitative Professional shells the statistical detector cannot reach

The strict-blind regime (AN-104) is what a regulator, journalist, or academic with only public TSE data can build today. AUC 0.742 is "fair triage" territory: at 10% threshold, catch ~30-40% of sponsored polls with ~5% false- positive rate. Useful for prioritizing audits, not for declaring individual polls slanted.

The shell evasion is detectable only by CNPJ-side audit. Blind statistical detection has a sophistication ceiling that AN-094-style shells deliberately operate above. Both detection mechanisms are needed.

Honest framing of what changed across the chain

The 0.07 jump from AN-101 → AN-104 is from comprehensive feature engineering (error vs final, within-poll distribution shape, multiple consensus measures). The 0.17 jump from AN-104 → AN-103 is the firm-identity bonus.

Update (2026-06-17): AN-104 → AN-106 correction

After AN-105 / AN-106, the 0.742 AUC reported here is now known to include ~0.05 AUC of race-attention proxy signal. Within-race linear demeaning (AN-105 Mode B) drops to 0.693; cross-fitted non-linear residualization (AN-106 DML) gives 0.717. Features like n_peers (R² = 0.78) and log_sample (R² = 0.83) are mostly race-explained — once race characteristics are properly residualized, the within-poll detection signal alone gives AUC ≈ 0.72. The honest "blind detection from public data" ceiling is therefore 0.72, not 0.74. The is_shell S0 coefficient of −0.055*** here shrinks to −0.015** under AN-106's proper race control — the shell "professional evasion" finding holds but is modest, not dramatic. See AN-106 for the corrected headline numbers used in paper/appendix_ml_detection.tex.

Caveats

Follow-ups

  1. Hyperparameter tuning on the strict-blind XGBoost. Could push to 0.76–0.78.
  2. Multi-cycle pooling (2020 + 2022 + 2024). Doubles n, enables train-2020 → test-2024 generalization test.
  3. SHAP attribution with strict-blind XGBoost — cleaner than tree importance for paper purposes.
  4. Add the suspicion score (strict-blind version) to the distributable artifact. Combined with the firm-augmented AN-103 score, the parquet has both regimes' OOS predictions for downstream / external use.

Artifacts