Strict-blind detection ceiling (no firm identity, no state, no municipality, no race aggregates — only poll results + election results + peer polls + dates + log_sample) achieves OOS AUC 0.742 with XGBoost on 28 features. Firm identity adds 0.17 AUC (AN-103's 0.911 ceiling). The strict-blind classifier confirms two structural findings: (i) is_other_firm survives joint firm + race FE on the spec ladder (+0.013, p<0.05) — the unaudited shell-like tail robustly shows a sponsored-poll fingerprint that does NOT rely on knowing the firm; (ii) is_shell is significantly NEGATIVE at S0 (−0.055\*\*\*) and S1 (−0.023\*\*\*) — the AN-094 identified shells defeat blind detection regardless of classifier sophistication, hardening the 'professional evasion' interpretation. Top features: log_sample, n_peers, poll_std_dev, max_abs_dev, days_to_election_sq — consensus-deviation features collectively dominate.
User direction (2026-06-17): build a detector using only poll results, election results, other polls' results, and dates of polling. NOT pollster identity, NOT state, NOT municipality.
Strict-blind feature set (28 features)
| Family | Count | Features |
|---|---|---|
| A. Error vs final results | 8 | mean_abs_error, max_abs_error, max_signed_error, min_signed_error, error_std, error_skew, error_concentration, n_cands_miss_5pp |
| B. Consensus deviation vs peer polls | 10 | max_signed_dev, max_abs_dev, signed_spike, abs_spike, poll_std_dev, mean_abs_dev, skew_dev, max_dev_rank, n_peers, peer_std |
| C. Within-poll distribution shape | 7 | n_cand_rows, herfindahl_shares, top1_share, top2_margin_poll, n_cands_at_zero, share_std, share_skew |
| D. Poll-intrinsic metadata | 3 | log_sample, days_to_election, days_to_election_sq |
Dropped vs AN-103 (9 features): firm_share_candidate_work, firm_has_beta_estimate, firm_total_polls, firm_n_ufs, firm_mean_sample, firm_abs_beta, race_n_polls, race_week_n_polls, n_candidates_in_race.
Model comparison
Y1: poll_has_candidate_sponsor (n=6,710, 15.3% positive)
| Model | AUC | Log-loss | Avg Precision |
|---|---|---|---|
| Logistic L2 | 0.685 | 0.404 | 0.248 |
| Random Forest | 0.730 | 0.383 | 0.326 |
| XGBoost | 0.742 | 0.380 | 0.340 |
| LightGBM | 0.736 | 0.396 | 0.327 |
Y2: cand-sponsored OR shell-touched (n=6,710, 18.7% positive)
| Model | AUC | Log-loss | Avg Precision |
|---|---|---|---|
| Logistic L2 | 0.695 | 0.450 | 0.307 |
| Random Forest | 0.716 | 0.434 | 0.380 |
| XGBoost | 0.732 | 0.425 | 0.417 |
| LightGBM | 0.723 | 0.442 | 0.404 |
XGBoost wins both targets at 0.742 / 0.732. The gradient boosting advantage over logistic regression is bigger here (~5-6 AUC points) than in AN-103 (~1 point) — non-linear interactions among within-poll features carry more of the signal when firm aggregates are removed.
Detection AUC progression — the honest range
| Analysis | Features | Method | AUC |
|---|---|---|---|
| AN-100 / 101 (sponsor-blind 7 features) | within-poll consensus + log_sample | Logit / GBM | 0.69 |
| AN-104 strict-blind (28 features) | + error vs final + within-poll distribution | XGBoost | 0.742 |
| AN-103 full (37 features) | + firm + race aggregates | LightGBM | 0.911 |
For the §Policy story, 0.74 is the honest "what public data alone achieves" number. The 0.17 gap to AN-103's 0.91 is what firm-identity adds — useful for adversarial / cumulative audit but cannot be deployed via public poll registry alone.
Top features (strict-blind, XGBoost)
| Rank | Feature | Importance |
|---|---|---|
| 1 | log_sample | 0.072 |
| 2 | n_peers | 0.056 |
| 3 | poll_std_dev | 0.052 |
| 4 | max_abs_dev | 0.045 |
| 5 | days_to_election_sq | 0.045 |
| 6 | mean_abs_dev | 0.044 |
| 7 | min_signed_error | 0.042 |
| 8 | max_signed_error | 0.042 |
| 9 | signed_spike | 0.039 |
| 10 | max_signed_dev | 0.035 |
Top 5 collectively = 27%. Consensus-deviation features (n_peers, poll_std_dev, max_abs_dev, mean_abs_dev, signed_spike, max_signed_dev) account for 6 of the top 10 positions. The strict-blind classifier IS using genuine within-poll and cross-poll signals, not firm-identity proxies. log_sample is #1 but its 0.07 share is far below the AN-103 firm-aggregate top 2 (0.19 + 0.16 = 0.35).
Spec ladder with strict-blind XGBoost predictions
Reference = media. Cluster SE at race.
| Bucket | S0 No FE | S1 Race FE | S2 Firm FE | S3 Firm + Race FE |
|---|---|---|---|---|
| is_candidate | +0.090*** (0.006) | +0.006 (0.007) | +0.040*** (0.006) | +0.004 (0.008) |
| is_pollster_self | −0.007* (0.004) | −0.003 (0.004) | −0.006 (0.006) | −0.012** (0.005) |
| is_other_firm | +0.016*** (0.005) | +0.006 (0.005) | +0.007 (0.005) | +0.013** (0.006) |
| is_shell | −0.055*** (0.008) | −0.023*** (0.008) | −0.007 (0.009) | −0.007 (0.009) |
| log_sample | −0.136*** (0.005) | −0.049*** (0.007) | −0.141*** (0.007) | −0.053*** (0.009) |
| n | 6,710 | 5,481 | 6,645 | 5,384 |
Three findings sharper than the AN-103 version because the classifier doesn't have firm-aggregate features absorbing the firm dimension:
1. is_other_firm survives joint firm + race FE
+0.013, p<0.05 at S3. The strict-blind classifier, trained without any firm-aggregate features, still assigns higher predicted-bias to other_firm polls vs media polls — within the same firm × race. This is the strongest within-firm-within- race shell-signal we have produced in any analysis. It confirms that the within-poll fingerprint of slant in the unaudited other_firm tier is real and not driven by firm- specific composition effects.
2. is_shell is significantly NEGATIVE at S0 and S1
S0: −0.055***. S1 (race FE): −0.023***. Even a fully blind classifier built on within-poll signals identifies AN-094-audited shells as LESS sponsored-like than media polls. The shell architecture defeats blind detection across three detector regimes:
- AN-100 / AN-101 weak classifier: null
- AN-103 strong classifier with firm leak: −0.072*** at S0
- AN-104 strict-blind strong classifier: −0.055*** at S0
The "professional evasion" reading hardens. AN-094-audited shells achieve structural mimicry of media polls (typical sample sizes, distributed within-poll deviations, no L-shape spike) that no statistical detector — weak or strong, blind or with firm info — can crack. The CNPJ-side classification (AN-094) is the binding identification method.
3. is_candidate at S3 is null
S3: +0.004, ns. With a strict-blind classifier, within firm × race candidate-sponsored polls don't have a within-poll fingerprint distinct from media polls of the same firm × race. This is the cleanest empirical confirmation of the AN-098 noise floor argument: within (cand × race × firm), the natural variance of poll error is wide enough to absorb the +7 pp slant in |error| terms. The slant is real and precisely measurable (AN-096 +7 pp signed error) but doesn't translate to a statistically-detectable within-poll pattern shift.
Substantive synthesis for the paper §Policy
Three regimes:
| Regime | Features | AUC | Captures |
|---|---|---|---|
| Blind statistical (AN-104) | Public poll data only | 0.742 | Unaudited shell-like tail + most candidate-sponsored polls |
| Firm-augmented statistical (AN-103) | + firm aggregates | 0.911 | Above + firm-identity signal (requires firm-level pooling) |
| CNPJ-side audit (AN-094, other session) | Sponsor CNPJ + CNAE + capital | qualitative | Professional shells the statistical detector cannot reach |
The strict-blind regime (AN-104) is what a regulator, journalist, or academic with only public TSE data can build today. AUC 0.742 is "fair triage" territory: at 10% threshold, catch ~30-40% of sponsored polls with ~5% false- positive rate. Useful for prioritizing audits, not for declaring individual polls slanted.
The shell evasion is detectable only by CNPJ-side audit. Blind statistical detection has a sophistication ceiling that AN-094-style shells deliberately operate above. Both detection mechanisms are needed.
Honest framing of what changed across the chain
- AN-100 / AN-101 (7 features): AUC 0.69. Within-poll only.
- AN-104 (28 features, strict blind): AUC 0.742. Within-poll
- error vs final + consensus + within-poll distribution + date + sample. The PRACTICAL ceiling for public-data detection.
- AN-103 (37 features, firm-augmented): AUC 0.911. The THEORETICAL ceiling if firm aggregation is allowed (i.e., pooling across a firm's many polls for reputation signals).
The 0.07 jump from AN-101 → AN-104 is from comprehensive feature engineering (error vs final, within-poll distribution shape, multiple consensus measures). The 0.17 jump from AN-104 → AN-103 is the firm-identity bonus.
Update (2026-06-17): AN-104 → AN-106 correction
After AN-105 / AN-106, the 0.742 AUC reported here is now
known to include ~0.05 AUC of race-attention proxy signal.
Within-race linear demeaning (AN-105 Mode B) drops to 0.693;
cross-fitted non-linear residualization (AN-106 DML) gives
0.717. Features like n_peers (R² = 0.78) and log_sample
(R² = 0.83) are mostly race-explained — once race
characteristics are properly residualized, the within-poll
detection signal alone gives AUC ≈ 0.72. The honest "blind
detection from public data" ceiling is therefore 0.72, not
0.74. The is_shell S0 coefficient of −0.055*** here
shrinks to −0.015** under AN-106's proper race control
— the shell "professional evasion" finding holds but is
modest, not dramatic. See AN-106 for the corrected headline
numbers used in paper/appendix_ml_detection.tex.
Caveats
- 6,710 polls in the analysis sample. Polls without peer polls for consensus computation are dropped (~3k lost from matched-share sample). High-attention races over-represented.
- AUC 0.74 is "fair" not "excellent." Many false positives and false negatives. The detector triages; it doesn't declare.
- Default hyperparameters. Bayesian optimization with nested CV would add 0.5–1.5 AUC points.
- 2024 only. Cross-cycle generalization untested.
Follow-ups
- Hyperparameter tuning on the strict-blind XGBoost. Could push to 0.76–0.78.
- Multi-cycle pooling (2020 + 2022 + 2024). Doubles n, enables train-2020 → test-2024 generalization test.
- SHAP attribution with strict-blind XGBoost — cleaner than tree importance for paper purposes.
- Add the suspicion score (strict-blind version) to the distributable artifact. Combined with the firm-augmented AN-103 score, the parquet has both regimes' OOS predictions for downstream / external use.
Artifacts
- Script:
source/analysis/an-104-strict-blind-detection.py - Model comparison:
build/table/an-104-strict-blind-detection.csv - Feature importance:
build/table/an-104-strict-blind-importance.csv - Headline JSON:
build/table/an-104-strict-blind-detection.json - Distributable:
build/analysis/poll_ml_predictions_blind.parquet(strict-blind OOS predictions per poll, two targets, four models — 0.742 AUC = the public-data ceiling)
Related
- AN-100 sponsor-blind detection — original 7-feature blind detector
- AN-101 predicted bias as outcome
- AN-103 full ML pipeline (with firm features) — superseded ceiling
- AN-094 (other session) shell audit — CNPJ-side audit
- AN-098 noise floor