Methodology completeness is HIGHER for candidate-touched polls (mean 0.43) than independent (0.39) — the opposite of Channel A's "candidates hide methodology" prediction. t = +1.25, p = 0.22 (wrong-signed and underpowered). Suggests Channel A on disclosure quantity is not the operative mechanism on this subset.

Confidence
yellow
Type
descriptive
Design
Sample
200-poll methodology LLM subset
Specification
per-poll completeness index = share of K operations-block fields that are NOT 'not_specified' / null. Histogram + 2-sample tests by sponsor_type.
Notes
D4 of six descriptives. Completeness operationalizes 'how disclosive is this pollster about how they ran the poll'. Channel A prediction: candidate-paying firms minimize disclosure (lower completeness) to keep methodology choices opaque.
Script
source/analysis/an-022-methodology-completeness-index.py
Target
build/figure/an-022-methodology-completeness-index.pdf
Status
interpreted · 2026-06-02
Created
2026-06-02

Question

Even without singling out one methodology lever, the share of the methodology schema actually filled in is itself a Channel-A signal: a pollster who minimizes disclosure has more room to slant without it showing up in any of the four cross-tabs we're running on specific levers. AN-019 showed that candidate-touched polls aren't dramatically more selective in coverage_class; AN-021 showed audit_pct distributions overlap heavily. The completeness index asks the shape-of-disclosure question directly.

Design

Per-poll completeness index = share of K operations-block boolean / categorical fields that are NOT not_specified / null / False. K is the count of substantive disclosure fields in the schema (mode, collection_device, audit_method, interviewer_training_described, data_consistency_checks, re_contact_verification, supervisor_role_described, funding_source_mentioned, question_order_described, geolocated, scenarios_described, name_rotation). A pollster who says "not_specified" / "False" on every field gets 0; one who fills every field gets 1.

Histogram of the index by sponsor_type (independent / candidate_touched / other). Two-sample t-test + KS test for the candidate-vs-independent gap.

Results

Methodology completeness histogram

bucket n mean median sd range
independent 141 0.394 0.417 0.126 [0.00, 0.58]
candidate_touched 25 0.427 0.417 0.119 [0.08, 0.67]
other 34 0.480 0.500 0.125 [0.17, 0.67]

t-test (cand vs indep): t = +1.25, p = 0.220. KS test (cand vs indep): D = 0.228, p = 0.187.

Interpretation

The direction is the opposite of the Channel A prediction. If candidate-touched polls were minimizing disclosure to hide methodology choices, we'd expect lower mean completeness. Instead: candidate-touched mean is higher (0.43 vs 0.39), not lower. Neither test rejects the same-distribution null at this sample size — so this is a directional null, not a refutation.

Two reads:

  1. Candidate-touched pollsters disclose at least as much as independent ones. If true, Channel A on "disclosure quantity" is not the operative mechanism. Slant would have to come from methodology choices within the disclosed schema (specific neighborhoods, low audit) rather than from non-disclosure. AN-019, AN-020, AN-021 show those specific-choice signals are weak too. Cumulatively, the Channel A signal on this n=200 subset is muted across all four cuts.
  2. "Other" bucket is the highest-completeness group (0.48), but it's also the residual with the loosest interpretation. Worth re-classifying before drawing conclusions.

If the universe-scale rerun confirms candidate-touched completeness ≥ independent completeness, it would meaningfully change the mechanism story: Channel B (residual / fabrication) would carry the load, and the Spec 3 regression's β shrinkage when methodology features are added should be small.

Follow-ups

  1. Universe-scale rerun (extension): same script, much bigger n_candidate. If the t-test stays wrong-signed at n_cand ≈ 800, the "candidates minimize disclosure" Channel A subprediction is empirically dead.
  2. Per-pollster completeness fingerprint (extension): aggregate completeness to pollster level. Is it a firm-fixed style? Then the candidate-touched ≥ independent finding might reflect that candidate-touched polls go to firms with already-completer methodology templates (Quaest, Datafolha), not that any firm becomes more disclosive when paid by a candidate. AN-024 (D5) addresses part of this.
  3. Split completeness by field type (extension): maybe candidate-touched polls fill more boilerplate fields (supervisor_role_described, interviewer_training_described) but less of the substantive ones (re_contact_verification, funding_source_mentioned). A by-field disclosure-rate table would expose that.