AN-041: Does mode of data collection differ between sponsored and matched independent polls?

Sponsored polls never use phone (0/244) and use in-person at 95%; independent uses phone at 10%. χ² on the joint mode table p = 0.0003 in the *opposite* direction of cheap-mode-slant. The mode-substitution lever is refuted on this sample.

Hypothesis: H4: Channel A contribution is larger where methodology flexibility is greater
Confidence: green
Type: descriptive

Design

Sample: 244 sponsored × independent curated pairs (same muni, same candidate, ±14 d)
Specification: within-pair contingency on (s_mode, i_mode); χ² on disagreement table; sign test on bias contrast among differing pairs
Comparator: independent poll within the matched pair
Notes: tests probe item 5 in source-of-bias.md — "Mode × sponsor quantitative contrast"

Script: source/analysis/an-041-mode-by-sponsor.py
Target: build/table/an-041-mode-by-sponsor.csv
Status: done · 2026-06-02
Created: 2026-06-02

Question

Quick-win probe from the source-of-bias agenda: does mode of data collection (in-person / phone / online / mixed) vary systematically between sponsored and matched independent polls? A sponsor that substitutes cheaper modes (phone-only, online) for in-person fieldwork could mechanically tilt the realized sample without violating any disclosure requirement — making mode a concrete Channel-A design lever.

Design

Source data: the 244 curated sponsored × independent pairs in build/llm/curated_pairs/pairs_with_extractions.parquet. Each pair is same muni × same candidate × ±14 days. The s_operations__mode and i_operations__mode fields are already extracted as a controlled vocabulary {in_person, phone, online, mixed, not_specified} from the LLM methodology pass.

Tests:

Marginals. sp mode marginal vs ind mode marginal across the 244 pairs.
Contingency. 5×5 joint distribution of (s_mode, i_mode) with a chi-squared test on the full table.
Sign test on differing pairs. Among pairs where the two sides disagree on mode, is the bias contrast sponsored_error - indep_error systematically signed? Tests whether the mode disagreement actually travels with the bias.

Power note: this is a thin contrast (the priors below show ~95% of sponsored polls use in_person mode), so the regression error ~ sponsored × mode + race × week FE from the original brief is not run here — the categorical contrast has near-zero variance on the sponsored side. Reported as a margin comparison plus the within-pair sign test; full-power regression is deferred to when the full-universe LLM methodology extractor lands (>200 protocols off in_person).

Results

Mode marginals (sponsored vs independent) + bias contrast on differing-mode pairs

Marginals (n=244 pairs):

Mode	Sponsored	Independent
In-person	232 (95.1%)	216 (88.5%)
Phone	0 (0.0%)	24 (9.8%)
Online	0	0
Mixed	0	2 (0.8%)
Not specified	12 (4.9%)	2 (0.8%)

Two facts jump out:

No sponsored poll in this 244-pair sample uses phone mode. Zero out of 244. Independent polls use phone in 10% of the sample. This is the opposite direction of the cheap-mode-substitution prior — sponsors are not picking phone polls to slant via reach.
Sponsored polls are more "in-person" but also more "not specified". The 6.6 pp difference in in_person share (95% vs 89%) is mirrored by a 4.1 pp difference in not_specified share (5% vs 1%) — sponsored polls advertise the gold-standard mode when they advertise at all, and decline to document otherwise.

Joint contingency (5×5, sparsified to 2×4 non-zero after dropping all-zero rows/cols):

	i:in_person	i:phone	i:mixed	i:not_spec
s:in_person	208	22	1	1
s:not_spec	8	2	1	1

χ² = 18.66 (dof=3), p = 0.0003. Modes are strongly non-independent across the pair, but the structure of the dependence is "sponsored side forces in_person or stays silent" — not a substitution toward cheaper modes.

Bias contrast on differing-mode pairs (n=35):

22 pairs sponsored more biased, 13 pairs independent more biased
Sign test p = 0.18 (two-sided); Wilcoxon signed-rank p = 0.16
Mean contrast = +2.23 pp (sponsored higher) — consistent with the overall headline direction but underpowered

Among the 35 differing-mode pairs, the bias contrast does not significantly differ from zero. There is no evidence that mode disagreement carries the within-pair bias.

Interpretation

The mode-substitution channel for sponsor bias is refuted on this sample. Sponsored polls do not lean on cheaper modes (phone, online, mixed) to mechanically tilt the sample; if anything, they overrepresent the gold-standard in-person mode while sometimes hiding the mode entirely.

Mode is not a Channel-A lever carrying the +7 pp slant. The χ² significance reflects opacity (sponsored "not_specified" rate is 5× higher than independent), not design substitution.

This strengthens the opacity-as-default reading in source-of-bias.md § Opacity differences and rules out one of the six concrete-design candidates listed in that doc's probe agenda (item 5).

Follow-ups

Why is phone mode entirely absent on the sponsored side? (puzzle): zero phone-mode polls across 244 sponsored is a sharp selection signal — phone-mode might be cheaper but also stigmatized as low-quality, so candidate-sponsors avoid it for reputational reasons. Worth checking on the full-universe extractor (when it lands) whether 0% holds at scale or shifts. Suggested script: mode-by-sponsor-universe.py after the methodology extractor completes.
Pollster fixed effects on the 35 differing-mode pairs. (extension): are the 22 phone-on-the-independent-side pairs concentrated in a few specific pollster firms (e.g., AtlasIntel phone-IVR polls)? If so, the bias contrast may reflect a firm-tier story, not a sponsor-side mode choice. Suggested script: tabulate pollster_cnpj × differing-pair flag on the AN-041 detail CSV.
Item 6: document the AN-041 size-mismatch finding in source-of-bias.md (extension): move "Mode × sponsor" from the open-questions table to the ruled-out table; flag mode-substitution as a refuted concrete-design mechanism. Source-of-bias edit only, no new script.
Run AN-042 (interviewer training × sponsor — quick win 2/3) (extension): the natural next probe in the source-of-bias agenda; uses already-extracted s_operations__interviewer_training_described and supervisor_role_described fields on the same 244-pair sample.