Benjamini-Hochberg FDR correction on the 10 within-pair Channel-A directional tests. 6 of 10 survive at q < 0.05 and 7 of 10 at q < 0.10. The only 'small positive, marginal' lever (population-frame mismatch, p = 0.12) does NOT survive correction (q_BH = 0.15), sharpening the 'no single lever carries +7 pp' conclusion.

Confidence
green
Type
robustness
Design
Sample
10 within-pair tests displayed in Table 3 of paper/paper.tex (paper Channel-A design-inventory).
Specification
BH q_i = p_i * m / rank_i with enforced monotonicity. m = 10.
Notes
p-values hardcoded from the published table to avoid recomputing across 10 different analysis scripts; each row in the script cites the AN page or analysis that produced its p.
Script
source/analysis/an-066-fdr-bh.py
Target
build/table/an-066-fdr-bh.csv
Status
interpreted · 2026-06-14
Created
2026-06-14

Question

GPT-5-pro's 2026-06-14 pre-submission review flagged that the Channel-A design-inventory in Table 3 of paper/paper.tex runs 10 within-pair directional tests on different design levers and invites a multiple-testing critique. The standard remedy is to BH-correct the displayed p-values and report q-values alongside.

Design

10 within-pair tests, one per lever, ordered as in Table 3:

Rank by p Lever p (displayed) Source AN
1 Scenario-rotation documentation ~4e-8 AN-051
2 Sample-design-consistent fabrication <10⁻⁴ AN-013v2
3 Phone-mode substitution 0.0003 AN-041
4 Partisan stronghold over-sampling <0.001 bairro-string oversample
5 Interviewer-training omission 0.002 AN-042
6 Coverage deferral at registration 0.02 AN-024
7 Ponderação specificity 0.04 AN-057
8 Population-frame mismatch (mixed) 0.12 AN-020 + frame
9 Methodology completeness gap 0.22 AN-022
10 Audit-rate floor 1.00 AN-021

BH q-value: q_i = p_i * m / rank_i, with right-to-left running min for monotonicity. m = 10.

Results

Rank Lever p q (BH) Survives q<0.05? Survives q<0.10?
1 Scenario-rotation documentation 4e-8 4e-7
2 Sample-design-consistent fabrication 1e-6 5e-6
3 Phone-mode substitution 0.0003 0.001
4 Partisan stronghold over-sampling 0.001 0.0025
5 Interviewer-training omission 0.002 0.004
6 Coverage deferral at registration 0.02 0.033
7 Ponderação specificity 0.04 0.057
8 Population-frame mismatch 0.12 0.15
9 Methodology completeness gap 0.22 0.244
10 Audit-rate floor 1.00 1.00

6 of 10 survive at q<0.05; 7 of 10 at q<0.10.

Interpretation

Caveats

Follow-ups