AN-019: Does coverage_class cluster differently across sponsor types?

On the 200-poll methodology subset, slant-permissive coverage classes (specific_neighborhoods + urban_only) appear in 12% of candidate-touched polls vs 10% of independent polls — direction matches Channel A but n_candidate=25 makes the gap noisy. Opaque coverage (deferred + not_specified) is 72% vs 80%, weakly contradicting the "candidates hide scope" prediction.

Hypothesis: H4: Channel A contribution is larger where methodology flexibility is greater
Confidence: yellow
Type: descriptive

Design

Sample: 200-poll methodology LLM subset (poll_methodology_2024__subset_n200.parquet) joined to sponsor parquet
Specification: cross-tab of coverage_class × sponsor_type (candidate_touched / independent / other), with row and column shares
Notes: First of six descriptives (D1-D6) on the methodology LLM subset before the universe extraction lands. coverage_class is the load-bearing Channel A variable; sponsor_type comes from poll_sponsor_2024_candidate's route classification.

Script: source/analysis/an-019-coverage-class-by-sponsor-type.py
Target: build/table/an-019-coverage-class-by-sponsor-type.csv
Status: interpreted · 2026-06-02
Created: 2026-06-02

Question

Of the 200 polls in the methodology LLM subset, which coverage_class values (specific_neighborhoods, urban_only, deferred_complement, etc.) appear disproportionately in candidate-touched sponsorships (any sponsor with route ∈ {cpf, committee, party, party_name}) vs independent sponsorships (all sponsors are media or pollster-self)? Coverage class is the load-bearing Channel A methodology lever — a poll that defers coverage to a complement document or restricts to specific neighborhoods has the most slant room without violating disclosure rules. If the slant mechanism is Channel A, we expect worse coverage classes (deferred, specific- neighborhoods) to cluster in the candidate-touched cell.

Design

Per-protocol classification:

candidate_touched: protocol has ≥1 sponsor row whose sponsor_route is one of {cpf, committee, party, party_name}.
independent: protocol's sponsor types are a subset of {media, pollster_self} and at least one such type is present (i.e., the poll_is_independent definition used elsewhere).
other: residual — protocols with other_firm or mixed sponsors.

Coverage class taxonomy from the LLM extraction (coverage__coverage_class field): full_municipality, urban_plus_selected_rural, urban_only, specific_neighborhoods, deferred_complement, not_specified.

Cross-tab + row shares + column shares + a stacked bar chart.

Results

Coverage class by sponsor type

Cross-tab on the n=200 methodology subset (n_independent = 141, n_candidate_touched = 25, n_other = 34):

coverage_class	independent	candidate_touched	other
full_municipality	2.8%	4.0%	17.6%
urban_plus_selected_rural	7.1%	12.0%	5.9%
urban_only	2.8%	0.0%	0.0%
specific_neighborhoods	7.1%	12.0%	14.7%
deferred_complement	55.3%	56.0%	38.2%
not_specified	24.8%	16.0%	23.5%

Slant-permissive (specific_neighborhoods + urban_only): 12% vs 10% (candidate vs independent). Direction matches Channel A but the gap is 2 pp on a n=25 candidate cell — not decisive.

Opaque (deferred + not_specified): 72% vs 80% (candidate vs independent). Candidate-touched polls are slightly less opaque, weakly contradicting the simple "candidates hide scope" prediction.

Interpretation

The simple Channel A reading — "candidate-touched polls disproportionately use selective-coverage classes" — gets weak support. The specific_neighborhoods cell does have candidate-touched at 12% vs independent at 7%, but on n=25 candidate-touched the absolute count is 3 protocols (out of 18 total specific_neighborhoods polls). That's suggestive, not decisive.

The deferred_complement rate (≈55% in both groups) suggests deferral itself is industry-wide boilerplate rather than a candidate-specific hiding tactic. AN-024 (D6) will probe deferral specifically.

The "other" residual (n=34) sits oddly — 18% full_municipality, much higher than either main group. These look like ad-hoc / one-off pollsters who don't follow the deferral convention. Worth flagging for the sponsor-type classifier LLM refinement.

The Channel A signal — if there is one — is probably not concentrated in coverage_class on this thin subset. Audit_pct, quota distributions, or operations levers (D3, D4) may carry more.

Follow-ups

Refit on the universe extraction (extension): when the LLM methodology pass runs on all 14,876 protocols, re-run this cross- tab. With ~800 candidate-touched polls (Routes A+B+C+D), the 12% vs 10% gap should sharpen to a clear difference or wash out entirely. Suggested script: same as this AN, swap input parquet.
Specific-neighborhoods deep-dive (puzzle): the 3 candidate-touched protocols in this cell are the most direct Channel A evidence on the subset. Pull the actual coverage__coverage_class_evidence and coverage__excluded_areas_listed text to see whether the excluded neighborhoods correlate with the sponsoring candidate's expected weakness (working-class districts for an incumbent, etc.).
Reclassify the "other" bucket (extension): 34 protocols land in "other" because their sponsor types don't cleanly fit the candidate-touched/independent split (mostly other_firm or mixed sponsors). The LLM sponsor-classifier refinement queued in docs/todo.md would split this cell — many "other_firm" sponsors are probably political consultancies / shells (would shift to candidate-touched).