User's concern (race-proxy leak in AN-104) confirmed. Mode A (7 theoretical slant signals, no race-attention features) gives OOS AUC 0.614 — barely above chance. Mode B (28 AN-104 features demeaned within race) gives 0.693 — basically the AN-101 baseline. The 'genuine within-poll slant signature' detection ceiling is ~0.69 (within-race signal only); AN-104's 0.742 included ~0.05 AUC worth of race-attention proxy leak. Critical change: under within-race demeaning, is_other_firm survives only weakly (+0.006 ns at S3) — the 'shell-like tail signal' in AN-104 was partly race-proxy. Shells light up POSITIVELY under pure theoretical features (+0.016*** at S0 of Mode A) — opposite sign from AN-104's negative — because race-attention features no longer absorb shell variation. Honest §Policy framing: blind detection ceiling is 0.69, well below earlier claims; the genuine within-poll slant fingerprint is weak (~0.61 with theoretical features only).
User's concern (2026-06-17): the AN-104 0.742 AUC might be detecting "race characteristics that attract sponsorship" rather than slant signatures. Two purifications:
- Mode A: hand-curated theoretical slant signals only
- Mode B: within-race demeaning of the AN-104 feature set
Both purify against race-attention proxy leak.
Mode A: theoretical slant signals (7 features)
Hand-curated features, each with direct slant interpretation:
| Feature | Theoretical interpretation |
|---|---|
| signed_spike_z | L-shape spike scaled by within-poll noise |
| l_shape_ratio | max-deviation concentration (max|dev| / mean|dev|) |
| top1_signed_inflation | how much top cand boosted vs consensus |
| max_dev_is_top_ranked | is the max-deviation cand the poll's #1? |
| skew_dev | positive skew = one outlier high (slant signature) |
| max_pos_dev | magnitude of biggest positive deviation |
| n_cands_at_zero | count of minor cands at near-zero (honest pattern) |
Results
| Model | AUC | Log-loss | AP |
|---|---|---|---|
| Logistic L2 | 0.602 | 0.416 | 0.198 |
| XGBoost | 0.614 | 0.419 | 0.201 |
| LightGBM | 0.610 | 0.434 | 0.198 |
AUC 0.614 — barely above chance. Pure theoretical signals alone are weak detectors. The slant fingerprint exists (the +7 pp slant is precisely identified in AN-096) but it does not manifest as a tightly-detectable individual-poll pattern.
Mode B: within-race demeaning (28 features)
Take the AN-104 strict-blind feature set. For each feature f: f'(p) = f(p) − mean(f for polls in same race(p))
Forces the classifier to use ONLY within-race variation. Race- level signal is absorbed before training.
Results
| Model | AUC | Log-loss | AP |
|---|---|---|---|
| Logistic L2 | 0.526 | 0.429 | 0.164 |
| XGBoost | 0.692 | 0.399 | 0.277 |
| LightGBM | 0.693 | 0.417 | 0.276 |
AUC 0.693 — back to the AN-101 baseline. Within-race demeaning removes ~0.05 AUC of the AN-104 advantage over the basic feature set. That's the race-attention proxy leak.
Detection AUC progression — honest version
| Analysis | Features | AUC |
|---|---|---|
| AN-100 / AN-101 | 7 basic within-poll features | 0.69 |
| AN-105 Mode A | 7 theoretical slant signals | 0.614 |
| AN-105 Mode B | 28 AN-104 features, race-demeaned | 0.693 |
| AN-104 (race-proxy leak present) | 28 raw features | 0.742 |
| AN-103 (firm-aggregate leak) | + firm features | 0.911 |
The honest "blind detection ceiling from public data" is AUC 0.69, not 0.74. AN-104's 0.05 advantage was race-proxy signal — popular races have more peer polls, late polls, more cands → those features correlate with sponsor mix.
The honest "pure within-poll slant signal" is AUC 0.61. The slant fingerprint exists but is not strongly identifiable in a small interpretable feature set.
Spec ladder — Mode B (within-race demeaned predictions)
This is the methodologically clean version. Reference = media.
| Bucket | S0 No FE | S1 Race FE | S2 Firm FE | S3 Firm + Race FE |
|---|---|---|---|---|
| is_candidate | +0.066*** | +0.020** | +0.044*** | +0.014* |
| is_pollster_self | −0.007* | −0.003 | +0.006 | +0.006 |
| is_other_firm | +0.011** | +0.007 | +0.001 | +0.006 |
| is_shell | +0.020*** | −0.011 | +0.001 | −0.000 |
Changes vs AN-104 (raw 28-feature spec):
is_other_firmat S3: AN-104 +0.013** → AN-105 Mode B +0.006 (ns). The AN-104 "shell-tail signal survives joint FE" finding was partly race-proxy. Under within-race demeaning, the within-firm-within-race other_firm signal weakens substantially.is_shellflips sign: AN-104 S0 −0.055*** → AN-105 Mode B S0 +0.020***. With race-attention features removed, shells weakly DO show up positive on within-poll signals at S0. Still null under any FE.is_candidatesurvives joint FE at +0.014* (p<0.10). Real but marginal signal.
Spec ladder — Mode A (theoretical signals predictions)
| Bucket | S0 | S1 Race FE | S2 Firm FE | S3 Firm + Race FE |
|---|---|---|---|---|
| is_candidate | +0.022*** | −0.002 | +0.019*** | +0.002 |
| is_pollster_self | −0.004 | +0.002 | +0.003 | +0.001 |
| is_other_firm | +0.008** | −0.000 | +0.007* | +0.002 |
| is_shell | +0.016*** | +0.003 | +0.015* | +0.008 |
Pure theoretical-feature predictions give modest positive coefficients for candidate, other_firm, AND shell at S0 and S2. Under theoretically-justified slant signals alone, shells DO light up weakly positive (+0.016*** at S0). The opposite sign from AN-104 confirms that AN-104's negative-shell coefficient was driven by race-attention features absorbing shell variation. With those removed, shells show a small positive slant-signature signal but it collapses under any FE — same noise-floor pattern.
Substantive synthesis: what the purifications tell us
The genuine within-poll slant signal is weak at the individual-poll level. Mode A AUC 0.61, Mode B AUC 0.69 — both modest. The slant of +7 pp is real (AN-096) but spread across natural noise in a way that defeats individual-poll classification (AN-098 noise floor).
Race-attention features were doing real work in AN-104. The 0.05 AUC gap between AN-104 raw and AN-105 Mode B is genuine signal — but it's signal about WHICH RACE you're in, not about whether the poll is slanted. For policy purposes that's a distinction with a difference.
The shell signal is unstable across detector regimes. AN-104 raw: shells look LESS sponsored (−0.055***) at S0. AN-105 Mode A (theory): shells look MORE sponsored (+0.016***) at S0. AN-105 Mode B (demeaned): shells look slightly more sponsored (+0.020***). The flip reveals that AN-104's negative shell signal was a race-feature absorption artifact, not a "shells evade detection" effect.
The 'professional evasion' interpretation needs refinement. AN-104's hardened "shells don't ping" story was overstated. Under purified detectors, shells show a small positive but marginal slant signal. The structural mimicry is real but not absolute. Shells are weakly detectable above noise; the AN-094 CNPJ-side audit remains the higher-power identification tool.
is_other_firm signal weakens substantially. AN-104 trumpeted "other_firm tail survives joint FE" but that was partly race-proxy. The genuine within-race signal is +0.006 at S3 (ns) — directional but not statistically distinguishable from zero. The within-poll fingerprint of unaudited shell-style polls is real but small.
Honest framing for the paper §Policy
Three detection regimes, with corrected AUC numbers:
| Regime | Features | AUC | Interpretation |
|---|---|---|---|
| Theoretical signatures alone | 7 hand-curated | 0.61 | Slant fingerprint is real but small in individual polls |
| Strict-blind within-race | 28 demeaned features | 0.69 | Genuine within-poll detection from public data |
| Strict-blind raw (AN-104) | 28 features | 0.74 | Inflated 0.05 by race-attention proxy |
| Firm-augmented (AN-103) | + firm aggregates | 0.91 | Inflated 0.22 by firm-identity proxy |
The defensible "what public-data blind detection achieves" number for the paper is 0.69, not 0.74. The 0.69 ceiling is tight: it represents the within-race within-poll signal that isn't a race-type proxy. Above-poll-level aggregation (firm- level, AN-103) buys more AUC but at the cost of conflating "identifying slanted polls" with "identifying firms that take candidate work."
The blind detector at AUC 0.69 is still in "fair triage" territory. Useful for flagging polls for closer audit, not for declaring individual polls slanted.
Implication for the AN-094 CNPJ-side approach
The fact that purified blind classifiers struggle (AUC 0.61– 0.69) reinforces the CNPJ-side audit's necessity, not its sufficiency. Two complementary detection mechanisms:
- Blind statistical: 0.61–0.69 AUC, useful for triage
- CNPJ-side audit (AN-094): high-precision identification of professional shells; lower coverage but higher confidence
For a §Policy proposal, both should be deployed together. The 0.69 blind detector flags candidates for audit; the CNPJ-side audit (capital social, CNAE, web presence) provides high- confidence shell flagging for the polls the statistical detector cannot distinguish.
Caveats
- Mode A's 7 features may be sub-optimal. Hand-curation is not exhaustive. Better-engineered theoretical features could push AUC up modestly (maybe to 0.65). But the qualitative point — theoretical slant signals are weak detectors — holds.
- Within-race demeaning loses signal too. Some race-level features (e.g. log_sample shape, days_to_election) carry real information that's also race-correlated. The Mode B AUC of 0.69 is a lower bound; the true "genuine within-poll signal" is probably between 0.69 and 0.74. The exact apportionment between race-proxy and within-poll signal is not fully identifiable.
- The shell-coefficient sign flips are striking but follow from the feature-set design, not data artifacts. AN-104's negative was real (driven by sample-size + race- attention features); AN-105's positive is real (driven by theoretical signals). Both are honest views.
Follow-ups
- Cross-validate Mode A and Mode B AUCs with cross-cycle data (2020 + 2022) to confirm the 0.61 / 0.69 ceilings.
- Better-engineered theoretical features. SHAP analysis of AN-104 LightGBM to find which features are doing the work, then design hand-curated versions of those signals.
- A "double-machine-learning" version: predict poll-features from race characteristics first (a separate model), use the residuals as input to the bias classifier. Methodologically cleaner than within-race demeaning for non-linear race effects.
- Within-firm demeaning as a parallel test. Removes firm- level signal in the same way Mode B removes race signal.
- Update the §Policy framing in paper with the corrected 0.69 ceiling and the "blind statistical + CNPJ-side audit" complementary mechanism story.
Artifacts
- Script:
source/analysis/an-105-theoretical-slant-detection.py - Model comparison:
build/table/an-105-theoretical-slant-detection.csv - Headline JSON:
build/table/an-105-theoretical-slant-detection.json
Related
- AN-104 strict-blind detection (race-proxy leak) — corrected here
- AN-103 full ML pipeline — firm-leak ceiling
- AN-098 noise floor — why individual-poll detection is hard
- AN-094 (other session) shell audit — CNPJ-side method