AN-104: Strict-blind detection

Strict-blind detection ceiling (no firm identity, no state, no municipality, no race aggregates — only poll results + election results + peer polls + dates + log_sample) achieves OOS AUC 0.742 with XGBoost on 28 features. Firm identity adds 0.17 AUC (AN-103's 0.911 ceiling). The strict-blind classifier confirms two structural findings: (i) is_other_firm survives joint firm + race FE on the spec ladder (+0.013, p<0.05) — the unaudited shell-like tail robustly shows a sponsored-poll fingerprint that does NOT rely on knowing the firm; (ii) is_shell is significantly NEGATIVE at S0 (−0.055\*\*\*) and S1 (−0.023\*\*\*) — the AN-094 identified shells defeat blind detection regardless of classifier sophistication, hardening the 'professional evasion' interpretation. Top features: log_sample, n_peers, poll_std_dev, max_abs_dev, days_to_election_sq — consensus-deviation features collectively dominate.

Hypothesis: H13: Shell-contratante polls show larger residual β
Confidence: green
Type: ml-ablation

Script: source/analysis/an-104-strict-blind-detection.py
Target: build/table/an-104-strict-blind-detection.csv
Status: interpreted · 2026-06-17
Created: 2026-06-17

User direction (2026-06-17): build a detector using only poll results, election results, other polls' results, and dates of polling. NOT pollster identity, NOT state, NOT municipality.

Family	Count	Features
A. Error vs final results	8	mean_abs_error, max_abs_error, max_signed_error, min_signed_error, error_std, error_skew, error_concentration, n_cands_miss_5pp
B. Consensus deviation vs peer polls	10	max_signed_dev, max_abs_dev, signed_spike, abs_spike, poll_std_dev, mean_abs_dev, skew_dev, max_dev_rank, n_peers, peer_std
C. Within-poll distribution shape	7	n_cand_rows, herfindahl_shares, top1_share, top2_margin_poll, n_cands_at_zero, share_std, share_skew
D. Poll-intrinsic metadata	3	log_sample, days_to_election, days_to_election_sq

Dropped vs AN-103 (9 features): firm_share_candidate_work, firm_has_beta_estimate, firm_total_polls, firm_n_ufs, firm_mean_sample, firm_abs_beta, race_n_polls, race_week_n_polls, n_candidates_in_race.

Model comparison

Model	AUC	Log-loss	Avg Precision
Logistic L2	0.685	0.404	0.248
Random Forest	0.730	0.383	0.326
XGBoost	0.742	0.380	0.340
LightGBM	0.736	0.396	0.327

Y2: cand-sponsored OR shell-touched (n=6,710, 18.7% positive)

Model	AUC	Log-loss	Avg Precision
Logistic L2	0.695	0.450	0.307
Random Forest	0.716	0.434	0.380
XGBoost	0.732	0.425	0.417
LightGBM	0.723	0.442	0.404

XGBoost wins both targets at 0.742 / 0.732. The gradient boosting advantage over logistic regression is bigger here (~5-6 AUC points) than in AN-103 (~1 point) — non-linear interactions among within-poll features carry more of the signal when firm aggregates are removed.

Detection AUC progression — the honest range

Analysis	Features	Method	AUC
AN-100 / 101 (sponsor-blind 7 features)	within-poll consensus + log_sample	Logit / GBM	0.69
AN-104 strict-blind (28 features)	+ error vs final + within-poll distribution	XGBoost	0.742
AN-103 full (37 features)	+ firm + race aggregates	LightGBM	0.911

For the §Policy story, 0.74 is the honest "what public data alone achieves" number. The 0.17 gap to AN-103's 0.91 is what firm-identity adds — useful for adversarial / cumulative audit but cannot be deployed via public poll registry alone.

Rank	Feature	Importance
1	log_sample	0.072
2	n_peers	0.056
3	poll_std_dev	0.052
4	max_abs_dev	0.045
5	days_to_election_sq	0.045
6	mean_abs_dev	0.044
7	min_signed_error	0.042
8	max_signed_error	0.042
9	signed_spike	0.039
10	max_signed_dev	0.035

Top 5 collectively = 27%. Consensus-deviation features (n_peers, poll_std_dev, max_abs_dev, mean_abs_dev, signed_spike, max_signed_dev) account for 6 of the top 10 positions. The strict-blind classifier IS using genuine within-poll and cross-poll signals, not firm-identity proxies. log_sample is #1 but its 0.07 share is far below the AN-103 firm-aggregate top 2 (0.19 + 0.16 = 0.35).

Reference = media. Cluster SE at race.

Bucket	S0 No FE	S1 Race FE	S2 Firm FE	S3 Firm + Race FE
is_candidate	+0.090*** (0.006)	+0.006 (0.007)	+0.040*** (0.006)	+0.004 (0.008)
is_pollster_self	−0.007* (0.004)	−0.003 (0.004)	−0.006 (0.006)	−0.012** (0.005)
is_other_firm	+0.016*** (0.005)	+0.006 (0.005)	+0.007 (0.005)	+0.013** (0.006)
is_shell	−0.055*** (0.008)	−0.023*** (0.008)	−0.007 (0.009)	−0.007 (0.009)
log_sample	−0.136*** (0.005)	−0.049*** (0.007)	−0.141*** (0.007)	−0.053*** (0.009)
n	6,710	5,481	6,645	5,384

Three findings sharper than the AN-103 version because the classifier doesn't have firm-aggregate features absorbing the firm dimension:

1. `is_other_firm` survives joint firm + race FE

+0.013, p<0.05 at S3. The strict-blind classifier, trained without any firm-aggregate features, still assigns higher predicted-bias to other_firm polls vs media polls — within the same firm × race. This is the strongest within-firm-within- race shell-signal we have produced in any analysis. It confirms that the within-poll fingerprint of slant in the unaudited other_firm tier is real and not driven by firm- specific composition effects.

2. `is_shell` is significantly NEGATIVE at S0 and S1

S0: −0.055***. S1 (race FE): −0.023***. Even a fully blind classifier built on within-poll signals identifies AN-094-audited shells as LESS sponsored-like than media polls. The shell architecture defeats blind detection across three detector regimes:

AN-100 / AN-101 weak classifier: null
AN-103 strong classifier with firm leak: −0.072*** at S0
AN-104 strict-blind strong classifier: −0.055*** at S0

The "professional evasion" reading hardens. AN-094-audited shells achieve structural mimicry of media polls (typical sample sizes, distributed within-poll deviations, no L-shape spike) that no statistical detector — weak or strong, blind or with firm info — can crack. The CNPJ-side classification (AN-094) is the binding identification method.

3. `is_candidate` at S3 is null

S3: +0.004, ns. With a strict-blind classifier, within firm × race candidate-sponsored polls don't have a within-poll fingerprint distinct from media polls of the same firm × race. This is the cleanest empirical confirmation of the AN-098 noise floor argument: within (cand × race × firm), the natural variance of poll error is wide enough to absorb the +7 pp slant in |error| terms. The slant is real and precisely measurable (AN-096 +7 pp signed error) but doesn't translate to a statistically-detectable within-poll pattern shift.

Substantive synthesis for the paper §Policy

Three regimes:

Regime	Features	AUC	Captures
Blind statistical (AN-104)	Public poll data only	0.742	Unaudited shell-like tail + most candidate-sponsored polls
Firm-augmented statistical (AN-103)	+ firm aggregates	0.911	Above + firm-identity signal (requires firm-level pooling)
CNPJ-side audit (AN-094, other session)	Sponsor CNPJ + CNAE + capital	qualitative	Professional shells the statistical detector cannot reach

The strict-blind regime (AN-104) is what a regulator, journalist, or academic with only public TSE data can build today. AUC 0.742 is "fair triage" territory: at 10% threshold, catch ~30-40% of sponsored polls with ~5% false- positive rate. Useful for prioritizing audits, not for declaring individual polls slanted.

The shell evasion is detectable only by CNPJ-side audit. Blind statistical detection has a sophistication ceiling that AN-094-style shells deliberately operate above. Both detection mechanisms are needed.

Honest framing of what changed across the chain

AN-100 / AN-101 (7 features): AUC 0.69. Within-poll only.
AN-104 (28 features, strict blind): AUC 0.742. Within-poll
- error vs final + consensus + within-poll distribution + date + sample. The PRACTICAL ceiling for public-data detection.
AN-103 (37 features, firm-augmented): AUC 0.911. The THEORETICAL ceiling if firm aggregation is allowed (i.e., pooling across a firm's many polls for reputation signals).

The 0.07 jump from AN-101 → AN-104 is from comprehensive feature engineering (error vs final, within-poll distribution shape, multiple consensus measures). The 0.17 jump from AN-104 → AN-103 is the firm-identity bonus.

Update (2026-06-17): AN-104 → AN-106 correction

After AN-105 / AN-106, the 0.742 AUC reported here is now known to include ~0.05 AUC of race-attention proxy signal. Within-race linear demeaning (AN-105 Mode B) drops to 0.693; cross-fitted non-linear residualization (AN-106 DML) gives 0.717. Features like n_peers (R² = 0.78) and log_sample (R² = 0.83) are mostly race-explained — once race characteristics are properly residualized, the within-poll detection signal alone gives AUC ≈ 0.72. The honest "blind detection from public data" ceiling is therefore 0.72, not 0.74. The is_shell S0 coefficient of −0.055*** here shrinks to −0.015** under AN-106's proper race control — the shell "professional evasion" finding holds but is modest, not dramatic. See AN-106 for the corrected headline numbers used in paper/appendix_ml_detection.tex.

Caveats

6,710 polls in the analysis sample. Polls without peer polls for consensus computation are dropped (~3k lost from matched-share sample). High-attention races over-represented.
AUC 0.74 is "fair" not "excellent." Many false positives and false negatives. The detector triages; it doesn't declare.
Default hyperparameters. Bayesian optimization with nested CV would add 0.5–1.5 AUC points.
2024 only. Cross-cycle generalization untested.

Follow-ups

Hyperparameter tuning on the strict-blind XGBoost. Could push to 0.76–0.78.
Multi-cycle pooling (2020 + 2022 + 2024). Doubles n, enables train-2020 → test-2024 generalization test.
SHAP attribution with strict-blind XGBoost — cleaner than tree importance for paper purposes.
Add the suspicion score (strict-blind version) to the distributable artifact. Combined with the firm-augmented AN-103 score, the parquet has both regimes' OOS predictions for downstream / external use.

Artifacts

Script: source/analysis/an-104-strict-blind-detection.py
Model comparison: build/table/an-104-strict-blind-detection.csv
Feature importance: build/table/an-104-strict-blind-importance.csv
Headline JSON: build/table/an-104-strict-blind-detection.json
Distributable: build/analysis/poll_ml_predictions_blind.parquet (strict-blind OOS predictions per poll, two targets, four models — 0.742 AUC = the public-data ceiling)

AN-100 sponsor-blind detection — original 7-feature blind detector
AN-101 predicted bias as outcome
AN-103 full ML pipeline (with firm features) — superseded ceiling
AN-094 (other session) shell audit — CNPJ-side audit
AN-098 noise floor

Strict-blind feature set (28 features)

Model comparison

Y2: cand-sponsored OR shell-touched (n=6,710, 18.7% positive)

Detection AUC progression — the honest range

Top features (strict-blind, XGBoost)

Spec ladder with strict-blind XGBoost predictions

1. `is_other_firm` survives joint firm + race FE

2. `is_shell` is significantly NEGATIVE at S0 and S1

3. `is_candidate` at S3 is null

Substantive synthesis for the paper §Policy

Honest framing of what changed across the chain

Update (2026-06-17): AN-104 → AN-106 correction

Caveats

Follow-ups

Artifacts

AN-104: Strict-blind detection

Strict-blind feature set (28 features)

Model comparison

Y1: poll_has_candidate_sponsor (n=6,710, 15.3% positive)

Y2: cand-sponsored OR shell-touched (n=6,710, 18.7% positive)

Detection AUC progression — the honest range

Top features (strict-blind, XGBoost)

Spec ladder with strict-blind XGBoost predictions

1. is_other_firm survives joint firm + race FE

2. is_shell is significantly NEGATIVE at S0 and S1

3. is_candidate at S3 is null

Substantive synthesis for the paper §Policy

Honest framing of what changed across the chain

Update (2026-06-17): AN-104 → AN-106 correction

Caveats

Follow-ups

Artifacts

Related

1. `is_other_firm` survives joint firm + race FE

2. `is_shell` is significantly NEGATIVE at S0 and S1

3. `is_candidate` at S3 is null