AN-013 v2: forensics targeting AN-013's blind spots

Three forensics targeting the AN-013 blind spots — design-respecting fabrication — return **two clean nulls and one ambiguous-cause positive**. (T1) Standardised-error variance test: var(z|sponsored, demeaned) = 79.7 vs var(z|indep, demeaned) = 315.3; F-test ratio 0.253 p<0.0001 but Levene's median-centred p = 0.122 — the F-test result is outlier-driven; sponsored polls are NOT 'too clean' (var of 80 σ² means SD ≈ 9 σ around the bias mean, plenty of spread). (T2) Bias-concentration test: sponsored polls have 11.5 % within ±2pp of own mean vs 19.0 % for indep (z = −4.75, p < 0.0001 in the *anti*-fabrication direction — sponsored polls are MORE spread, not less). (T3) Within-firm rounding TVD on tenths-digit of poll_percent_raw: 12 firms qualify with ≥10 sponsored and ≥10 indep each; mean TVD = 0.39 (vs ~0.15-0.20 expected under H0); 3 of 12 firms significant at chi-square p<0.05 (vs 0.6 expected). Read: T1+T2 argue *against* simple sample-design-consistent fabrication as the headline mechanism — the sponsored-error distribution is wider, not tighter, than the indep distribution. T3 picks up real within-firm processing differences (could be differential subcontracting, customer-specific reporting templates, or fabrication; cause is not separable from this test). Combined with AN-013 v1 (no per-row digit-tampering signature), the cumulative weight of evidence is that **the +7 pp is not concentrated in a single big fabrication lever**. The residual likely lives in a constellation of small effects across sample-frame contamination (1-4 pp prior), interviewer scripting (0-2 pp), and strategic timing (1-2 pp), each individually modest.

Hypothesis: H1: Self-sponsored polls overstate the sponsoring candidate
Confidence: green
Type: robustness

Design

Sample: 31,186 candidate-poll rows from build/assemble/cand_poll.parquet; 641 sponsored_by==1, 21,209 poll_is_independent==1. T3 restricted to 12 firms with ≥10 sponsored AND ≥10 indep rows.
Specification: T1 standardised error: z_i = error_i / SE_expected_i, where SE_expected = 100*sqrt(p(1-p)/n) under SRS at declared sample size with p = final_share. Group means subtracted, then F-test and Levene's on demeaned z. T2 concentration: fraction of polls within ±2 pp of own-group mean error, two-sample binomial-proportion z-test. T3 within-firm rounding: tenths-digit of poll_percent_raw; per-firm chi-square of sponsored vs indep distribution; aggregate mean TVD.
Comparator: indep polls (poll_is_independent==1) for T1, T2; within-firm indep polls for T3
Cluster: firm (T3); none (T1, T2)

Script: source/analysis/an-013v2-fabrication-forensics.py
Target: build/table/an-013v2-fabrication-forensics.csv
Status: interpreted · 2026-06-14
Created: 2026-06-14

Question

AN-013 ruled out crude per-row tampering via digit-frequency tests (uniform last-digit, Benford leading-digit, round-number frequency within sponsored polls). AN-013 explicitly listed three blind spots: sophisticated manipulation preserving digit distributions, proportional within-poll rescaling, and pre-publication data work that leaves no digit signature.

The residual decomposition added to docs/thinking.md on 2026-06-14 flagged "sample-design-consistent fabrication" (prior magnitude 2-5 pp) as the largest remaining untested mechanism after AN-059 zeroed out firm-level selection. AN-013 v2 designs three tests sensitive to design-respecting fabrication, which AN-013 v1 could not detect.

Tests

source/analysis/an-013v2-fabrication-forensics.py:

T1 — Standardised-error variance ("too clean" test). Under honest sampling at declared n with true share p, SE(error in pp) = 100×√(p(1-p)/n). Standardise: z = error / SE_expected. Under H0 (honest sampling + systematic bias mean) the demeaned z should have variance ≈ 1. Under design-respecting fabrication the manipulator chooses biases without adding sampling noise, so var(z | sponsored, demeaned) should fall below var(z | indep, demeaned). Tests: F-test of variance ratio (sensitive to outliers) and Levene's median-centred test (robust).

T2 — Bias concentration test. Fraction of polls within ±2 pp of own-group mean error. Under honest slant + sampling noise the sponsored-error distribution should be wide around the bias mean. Under fabrication the sponsored errors cluster tightly around the chosen bias. Two-sample binomial-proportion z-test on the difference.

T3 — Within-firm rounding-pattern shift. Tenths-digit of poll_percent_raw modulo 10. For each firm with ≥10 sponsored AND ≥10 indep rows, compare the digit distribution across the two sponsorship types. Per-firm chi-square test; aggregate via mean and median Total Variation Distance (TVD).

Results

Test	Direction predicted under fabrication	Observed	Reading
T1 — F-test on demeaned z variances	ratio < 1	ratio = 0.253, F p < 0.0001	Driven by indep outliers; Levene p = 0.12 not sig
T1 — Levene's median-centred	indep − sponsored > 0	p = 0.122	Variances not significantly different
T2 — frac within ±2pp of own mean	sponsored > indep	sponsored 11.5 % < indep 19.0 %	z = −4.75, p < 0.0001 anti-fabrication
T3 — within-firm rounding TVD	TVD elevated	mean TVD = 0.39 (vs ~0.15-0.20 baseline); 3/12 firms chi-square sig	Real within-firm processing differential; cause ambiguous

T1 detail — sponsored polls are not "too clean"

Sponsored z mean = +5.87 (= +7 pp bias ÷ ~1.2 pp expected SE for n=500, p=0.4), so the bias is huge relative to sampling noise. But var(z | sponsored, demeaned) = 79.68, i.e. SD ≈ 8.9 σ around the bias mean. The sponsored-error distribution is wide, not tight. A classical fabrication that targets a fixed +7 pp would produce var(z) ≈ 1 (just sampling noise on top of a fixed shift). The observed 80 rules that out.

The F-test's significant ratio (0.253) is misleading: indep has var(z) = 315 — dominated by outliers from small candidates with near-zero true share and small expected SE. Levene's median-centred test (robust to outliers) shows no significant variance difference.

T2 detail — sponsored polls spread more, not less

Sponsored mean error = +12.12 pp (note: this is the raw cross-sectional mean, much larger than the within-cand FE +7.85 from the headline, because sponsored polls cluster on leading candidates who are naturally over-stated). 11.5 % of sponsored polls fall within ±2pp of this mean. The indep distribution has 19.0 % within ±2pp of its own mean of +2.86 pp.

Sponsored polls are less concentrated around their group mean than indep polls. Mechanism-agnostic reading: the sponsor effect is heterogeneous (some sponsored polls slant much, some little), which is what we'd expect under design-driven slant calibrated per race, not under uniform fabrication.

T3 detail — within-firm rounding shows real signal

Of 12 firms with ≥10 sponsored and ≥10 indep rows: 11/12 show TVD > 0.20 between sponsored and indep tenths-digit distributions (vs ~0.15-0.20 baseline under sampling noise at this n). 3 of 12 firms reach chi-square p<0.05 vs 0.6 expected under H0 — a 5× elevation.

The signal is real but the cause is not separable from this test alone. Candidate explanations:

Fabrication of sponsored-poll data with a different rounding habit.
Subcontracting: the firm farms sponsored work to a partner with different tabulation software.
Customer-specific reporting: candidate customers may demand integer-percentage reports while media customers accept tenths.
Different question-batteries: sponsored polls may report a different scenario than indep polls and the tenths-distribution reflects scenario differences, not tampering.

The third and fourth explanations are likely most of the T3 signal. Discriminating between them requires a within-firm × scenario test that isn't built here.

Interpretation

Net result: simple fabrication is unlikely

T1 and T2 together argue against a single-lever fabrication story: sponsored polls spread MORE than indep around their own bias mean, not less. A fabricator targeting +7 pp would leave a tighter distribution; we see a wider one. AN-013 v1's null on per-row digit tampering + AN-013 v2's null on distribution-shape "too clean" tests shrink the plausible magnitude of the fabrication category from the prior 2-5 pp toward 0-2 pp.

T3's within-firm rounding signal is real and warrants documentation, but mundane explanations (customer-specific reporting formats, subcontracting, scenario differences) are at least as plausible as fabrication.

Where the residual now lives

The docs/thinking.md residual decomposition (2026-06-14, updated 2026-06-14 with AN-059 + AN-013 v2):

Category	Prior magnitude	Post-update (this AN)
Sample-design-consistent fabrication	2-5 pp	0-2 pp (T1+T2 nulls)
Firm-level slant-for-hire selection	2-4 pp	0 pp (AN-059)
Wave selection	1-3 pp	0 pp (AN-003 placebo)
Sample frame contamination	1-4 pp	unchanged (untestable from registered data)
Interviewer scripting	0-2 pp	unchanged (untestable from registered data)
Strategic timing × news events	1-2 pp	unchanged (needs event db)

The unexplained 2-6 pp is now concentrated in the three "effectively untestable from registered data" categories. The mechanism story for the paper increasingly looks like a constellation:

~1-2 pp from scenario rotation under-doc (AN-051)
~0-2 pp from ponderação selective disclosure (AN-057)
~0-1 pp from population-reference mismatch (AN-056)
A residual 2-6 pp from sample-frame contamination, interviewer scripting, and strategic timing — small individually, additive collectively, individually untestable from public data.

Follow-ups

T3 disambiguation (low-priority, modest paper value). Within-firm × scenario test of the tenths-digit distribution would separate "different scenarios reported" from genuine processing differential. Cheap (no new data) but the signal is already documented; pursuing it further is a refinement, not a discovery.
Event-database build for strategic-timing test (#6 in the residual decomposition) (highest remaining test value). The pipelines/justica EJ pipeline already has event-side data (campaign-event filings, lawsuit cycles); a join to poll field_period_week would test whether sponsored polls cluster on post-event windows. Estimated lift: ~1-2 days for the join + analysis.
Update paper's Channel A vs B narrative. The accumulating null/wrong-signed pattern across structural levers + the fabrication-unlikely finding here together argue the +7 pp is a constellation, not a single lever. The paper currently leaves the mechanism unresolved; this set of analyses turns "unresolved" into "concentrated in the data-inaccessible categories" — a sharper claim worth making explicitly.