Double-machine-learning (race-confounder residualization via cross-fitted XGBoost regression, then XGBoost classifier on residuals) achieves OOS AUC 0.717 — sits between AN-105 Mode B linear demeaning (0.693) and AN-104 raw (0.742). Non-linear race effects worth +0.024 AUC vs linear demean; remaining +0.025 to AN-104 raw is race-correlated signal that DML cannot remove (race confounders are imperfect proxies). R² of residualization shows the race-leakiest features are log_sample (0.825), n_peers (0.775), error_concentration (0.693), n_cand_rows (0.690) — those were carrying mostly race signal. Spec ladder shows is_shell coefficient flips back to NEGATIVE at S0 (−0.015\*\*) under DML — substantively cleanest 'shells mimic media' finding. is_other_firm at S1 (+0.008\*) borderline positive but weakens further at S3 (+0.007 ns).

Confidence
green
Type
ml-double-ml
Script
source/analysis/an-106-double-ml-detection.py
Target
build/table/an-106-double-ml-detection.csv
Status
interpreted · 2026-06-17
Created
2026-06-17

User-flagged follow-up #3 from AN-105: linear within-race demeaning may miss non-linear race effects. The cleaner approach is double-machine-learning (Robinson 1988 / Chernozhukov et al. 2018 style):

  1. Stage 1: for each feature f, train a flexible model predicting f from race-level confounders (XGBoost regression, 5-fold OOS). → residual_f = f − f̂(race_confounders)
  2. Stage 2: train bias classifier on residuals.

This handles non-linear interactions of race characteristics that within-race linear demeaning misses.

Race-level confounders (35 columns)

R² of residualization per feature

How much of each feature variance is explained by race confounders. Higher R² = feature was mostly race-predicted; the residual is the genuine within-poll signal.

Feature Race-leak interpretation
log_sample 0.825 Almost entirely race-predicted
n_peers 0.775 Heavily race-predicted
error_concentration 0.693 Mostly race-predicted
n_cand_rows 0.690 Mostly race-predicted
herfindahl_shares 0.432 Moderately
top1_share 0.375 Moderately
days_to_election 0.331 Moderately
n_cands_at_zero 0.300 Moderate
peer_std 0.288 Moderate
share_skew 0.281 Moderate
signed_spike 0.237 Mostly within-poll
poll_std_dev 0.235 Mostly within-poll
mean_abs_dev 0.228 Mostly within-poll
max_signed_dev 0.152 Mostly within-poll
skew_dev −0.037 Pure within-poll

The race-leakiest features were exactly the suspected ones: log_sample (race investment), n_peers (race attention), error_concentration / n_cand_rows (race structure). These were 75-85% race signal. The within-poll consensus features (signed_spike, poll_std_dev, mean_abs_dev, max_signed_dev, skew_dev) are >75% within-poll signal — those are the cleanest slant fingerprints.

Classifier results

5-fold CV OOS on race-residualized 28 features:

Model AUC Log-loss AP
XGBoost 0.717 0.390 0.312
LightGBM 0.709 0.408 0.309

Detection AUC progression — complete table

Method Features AUC Reading
AN-100 / AN-101 7 basic 0.69 Baseline
AN-105 Mode A 7 theoretical signals 0.614 Pure slant signatures, weak
AN-105 Mode B 28 features, linear demean 0.693 Linear within-race signal
AN-106 DML 28 features, non-linear residualize 0.717 + non-linear race effects
AN-104 raw 28 features, unfiltered 0.742 (Partly race-proxy)
AN-103 full + firm aggregates 0.911 (+ firm-identity leak)

Three-way decomposition of the AN-104AN-101 gap:

Spec ladder with DML predictions

Reference = media. Cluster SE at race.

Bucket S0 No FE S1 Race FE S2 Firm FE S3 Firm + Race FE
is_candidate +0.071*** −0.004 +0.033*** −0.004
is_pollster_self −0.000 +0.000 +0.002 −0.004
is_other_firm +0.011** +0.008* +0.003 +0.007
is_shell −0.015** −0.006 +0.008 +0.011
log_sample −0.118*** +0.022*** −0.123*** +0.009

Cross-method comparison of key coefficients

Method is_other_firm S1 is_shell S0
AN-103 (full, GBM, firm-leak) +0.012 −0.005
AN-104 raw (28 features) +0.006 −0.055***
AN-105 Mode A (7 theoretical) −0.000 +0.016***
AN-105 Mode B (linear demean) +0.007 +0.020***
AN-106 DML (non-linear) +0.008* −0.015**

The is_shell sign flips across methods are diagnostic:

The DML version is the most methodologically defensible "do shells look sponsored?" answer: −0.015 at S0, null under FE. Under proper non-linear race control, shells DO mimic media polls — but the magnitude (−0.015) is much smaller than AN-104's −0.055 suggested. The shell "professional evasion" interpretation is real but modest, not dramatic.

The is_other_firm signal also weakens further: from +0.013** at S3 in AN-104 raw to +0.007 (ns) at S3 in AN-106 DML. The genuine within-firm-within-race shell-style signal in the unaudited tail is small.

Substantive synthesis

The DML analysis converges on three honest claims:

  1. Genuine blind detection from public TSE data: AUC ~0.69–0.72. The 0.69 (linear) and 0.72 (non-linear) bracket the honest "what within-poll signal achieves" range. The AN-104 0.742 was inflated by ~0.05 AUC of race-proxy correlation that even DML can't fully remove.

  2. Shells modestly mimic media. Under proper race control, is_shell coefficient is −0.015 (p<0.05) at S0 — meaningful but modest. The "professional evasion" framing from AN-103/ AN-104 was over-quantified; shells achieve partial structural blend-in, not absolute invisibility.

  3. The within-poll consensus-deviation features are the cleanest slant signatures. signed_spike, poll_std_dev, mean_abs_dev, max_signed_dev all have R² < 0.25 in residualization — meaning >75% within-poll variance. These are the features the §Policy story should highlight as public, blindly-detectable signs of slant.

Honest §Policy framing — final version

Detection regime Method AUC Cleanest interpretation
Theoretical-feature-only Hand-curated slant signals 0.61 Pure interpretable slant — small
Public-data blind Within-poll + race-controlled 0.69–0.72 What public registry can do
Firm-augmented + firm-level aggregates 0.91 Requires firm pooling
CNPJ-side audit Capital + CNAE + web presence qualitative High-precision shell ID

The defensible "blind detection from public data" range is 0.69–0.72. That's the public-policy-relevant number. AUC

0.72 requires firm-level pooling; AUC < 0.65 is what theory- only achieves. Both AN-094 CNPJ-side audit and AN-100/101/106 within-poll signals are needed to cover both axes of evasion.

Caveats

Follow-ups

  1. Train DML on 2020 + 2024 jointly with cycle as additional confounder. Tests cross-cycle generalization.
  2. Add firm-level confounders to stage 1 for a "race AND firm controlled" detector — closer to the firm-leak adjusted AN-103 regime.
  3. SHAP attribution on the DML-residualized classifier to identify the within-poll features that actually drive the 0.72 ceiling.
  4. Cross-fitted nested CV for fully-orthogonal DML inference. More compute-intensive but methodologically sharper.

Artifacts