Sponsored polls never use phone (0/244) and use in-person at 95%; independent uses phone at 10%. χ² on the joint mode table p = 0.0003 in the *opposite* direction of cheap-mode-slant. The mode-substitution lever is refuted on this sample.

Confidence
green
Type
descriptive
Design
Sample
244 sponsored × independent curated pairs (same muni, same candidate, ±14 d)
Specification
within-pair contingency on (s_mode, i_mode); χ² on disagreement table; sign test on bias contrast among differing pairs
Comparator
independent poll within the matched pair
Notes
tests probe item 5 in source-of-bias.md — "Mode × sponsor quantitative contrast"
Script
source/analysis/an-041-mode-by-sponsor.py
Target
build/table/an-041-mode-by-sponsor.csv
Status
done · 2026-06-02
Created
2026-06-02

Question

Quick-win probe from the source-of-bias agenda: does mode of data collection (in-person / phone / online / mixed) vary systematically between sponsored and matched independent polls? A sponsor that substitutes cheaper modes (phone-only, online) for in-person fieldwork could mechanically tilt the realized sample without violating any disclosure requirement — making mode a concrete Channel-A design lever.

Design

Source data: the 244 curated sponsored × independent pairs in build/llm/curated_pairs/pairs_with_extractions.parquet. Each pair is same muni × same candidate × ±14 days. The s_operations__mode and i_operations__mode fields are already extracted as a controlled vocabulary {in_person, phone, online, mixed, not_specified} from the LLM methodology pass.

Tests:

  1. Marginals. sp mode marginal vs ind mode marginal across the 244 pairs.
  2. Contingency. 5×5 joint distribution of (s_mode, i_mode) with a chi-squared test on the full table.
  3. Sign test on differing pairs. Among pairs where the two sides disagree on mode, is the bias contrast sponsored_error - indep_error systematically signed? Tests whether the mode disagreement actually travels with the bias.

Power note: this is a thin contrast (the priors below show ~95% of sponsored polls use in_person mode), so the regression error ~ sponsored × mode + race × week FE from the original brief is not run here — the categorical contrast has near-zero variance on the sponsored side. Reported as a margin comparison plus the within-pair sign test; full-power regression is deferred to when the full-universe LLM methodology extractor lands (>200 protocols off in_person).

Results

Mode marginals (sponsored vs independent) + bias contrast on differing-mode pairs

Marginals (n=244 pairs):

Mode Sponsored Independent
In-person 232 (95.1%) 216 (88.5%)
Phone 0 (0.0%) 24 (9.8%)
Online 0 0
Mixed 0 2 (0.8%)
Not specified 12 (4.9%) 2 (0.8%)

Two facts jump out:

  1. No sponsored poll in this 244-pair sample uses phone mode. Zero out of 244. Independent polls use phone in 10% of the sample. This is the opposite direction of the cheap-mode-substitution prior — sponsors are not picking phone polls to slant via reach.
  2. Sponsored polls are more "in-person" but also more "not specified". The 6.6 pp difference in in_person share (95% vs 89%) is mirrored by a 4.1 pp difference in not_specified share (5% vs 1%) — sponsored polls advertise the gold-standard mode when they advertise at all, and decline to document otherwise.

Joint contingency (5×5, sparsified to 2×4 non-zero after dropping all-zero rows/cols):

i:in_person i:phone i:mixed i:not_spec
s:in_person 208 22 1 1
s:not_spec 8 2 1 1

χ² = 18.66 (dof=3), p = 0.0003. Modes are strongly non-independent across the pair, but the structure of the dependence is "sponsored side forces in_person or stays silent" — not a substitution toward cheaper modes.

Bias contrast on differing-mode pairs (n=35):

Among the 35 differing-mode pairs, the bias contrast does not significantly differ from zero. There is no evidence that mode disagreement carries the within-pair bias.

Interpretation

The mode-substitution channel for sponsor bias is refuted on this sample. Sponsored polls do not lean on cheaper modes (phone, online, mixed) to mechanically tilt the sample; if anything, they overrepresent the gold-standard in-person mode while sometimes hiding the mode entirely.

Mode is not a Channel-A lever carrying the +7 pp slant. The χ² significance reflects opacity (sponsored "not_specified" rate is 5× higher than independent), not design substitution.

This strengthens the opacity-as-default reading in source-of-bias.md § Opacity differences and rules out one of the six concrete-design candidates listed in that doc's probe agenda (item 5).

Follow-ups

  1. Why is phone mode entirely absent on the sponsored side? (puzzle): zero phone-mode polls across 244 sponsored is a sharp selection signal — phone-mode might be cheaper but also stigmatized as low-quality, so candidate-sponsors avoid it for reputational reasons. Worth checking on the full-universe extractor (when it lands) whether 0% holds at scale or shifts. Suggested script: mode-by-sponsor-universe.py after the methodology extractor completes.

  2. Pollster fixed effects on the 35 differing-mode pairs. (extension): are the 22 phone-on-the-independent-side pairs concentrated in a few specific pollster firms (e.g., AtlasIntel phone-IVR polls)? If so, the bias contrast may reflect a firm-tier story, not a sponsor-side mode choice. Suggested script: tabulate pollster_cnpj × differing-pair flag on the AN-041 detail CSV.

  3. Item 6: document the AN-041 size-mismatch finding in source-of-bias.md (extension): move "Mode × sponsor" from the open-questions table to the ruled-out table; flag mode-substitution as a refuted concrete-design mechanism. Source-of-bias edit only, no new script.

  4. Run AN-042 (interviewer training × sponsor — quick win 2/3) (extension): the natural next probe in the source-of-bias agenda; uses already-extracted s_operations__interviewer_training_described and supervisor_role_described fields on the same 244-pair sample.