Sponsored polls flex the population frame and the bairro list — Channel A signatures from a 130-pair eyeball

🟡 Single-source qualitative finding from a curated bias-contrast pair analysis (244 sponsored × independent pairs of the same muni × same candidate × within ±14 days; 77 classical-bias pairs after filtering). Suggests three specific Channel A levers are the dominant design differences: (a) flexible population reference frame ("mixed" vs cleanly TSE-eligible), (b) census-setor usage as cluster frame (also flexes ~13pp more often in biased pairs), and (c) geographic coverage restriction (urban-only or specific-neighborhoods on the sponsored side).

This is a hypothesis-generation finding, not a quantitative effect estimate. Statistical power for the regression test (preview below) remains limited; the value is the directional signal for the next round of testing.

Method

Built a curated sample from analysis_table.parquet:

Pair definition: same muni × same sponsor-candidate × field_end within ±14 days. Sponsored = poll_has_candidate_sponsor=1 with sponsor's candidate identified via routes A+B+C+D. Independent = poll_is_independent=1 (news / pollster-self / public).
Contrast metric: error_sponsored − error_independent for the sponsor's candidate. After filtering to pairs where the candidate appears in both polls (indep_poll_pct ≥ 5% — drops "indep poll didn't include this candidate" artifacts).
Strata: 244 pairs total → 77 "biased" (sponsored over-states by ≥5pp AND larger absolute error than indep), 39 "well-behaved" (|contrast| < 1.5pp), 128 "other" (mostly extreme cases where the indep poll was the defective one).
Extraction: ran poll_sampling, poll_coverage, poll_operations, poll_bairro_detail on all 338 unique protocols. Source: pipelines/politica/source/llm/. Cache: pipelines/politica/build/llm/poll_*/. 322 of 338 protocols had a bairro PDF; the remaining 16 are mayor polls whose protocol number doesn't appear in the bairro_municipio_2024.zip (likely never-filed complements).

Diff-rate gaps (biased − unbiased, key fields)

n_biased = 77, n_unbiased = 39.

Field	Biased differ	Unbiased differ	Gap
`sampling.census_sectors_used`	39%	26%	+13pp
`sampling.population_reference`	26%	13%	+13pp
`operations.audit_method`	65%	54%	+11pp
`operations.interviewer_training_described`	39%	31%	+8pp
`bairro_detail.rural_included`	29%	21%	+8pp
`operations.supervisor_role_described`	18%	13%	+5pp
`operations.audit_pct`	48%	44%	+4pp
`coverage.coverage_class`	39%	38%	+1pp

bairro_detail.n_bairros_total differs in 99% of biased pairs and 82% of unbiased — large in absolute level but the gap (+17pp) is hard to interpret without normalizing by the underlying scale of bairro counts.

Diff-rate gaps for the high-noise fields (cluster_unit and most coverage_* fields differ uniformly ~70-100% across all strata) don't discriminate strata even when large.

Directional patterns

The most informative cuts are not the diff rates but the direction of disagreement among the pairs that do differ.

`population_reference` (20 biased pairs differ; 0 go "the other way")

Sponsored uses	Independent uses	Count
`mixed`	`tse_eligible`	7
`mixed`	`census_2022_residents`	2
`not_specified`	`tse_eligible`	1
`tse_eligible`	`census_2022_residents`	1
`tse_eligible`	`mixed`	0

Sponsored biased polls flex the population frame. Blending census

TSE + other lets pollsters implicitly reweight cells — no single source's discipline is violated, but the cell-level weights become a free parameter. Zero of the 20 differing pairs has the independent flexing while the sponsored stays clean. The asymmetry is unambiguous.

`bairro_detail.coverage_class_resolved` (49 biased pairs differ)

Sponsored	Independent	Count
`urban_only`	`full_municipality`	10
`full_municipality`	`urban_plus_selected_rural`	6
`specific_neighborhoods`	`full_municipality`	5
`not_realized`	`full_municipality`	2
`urban_only`	`urban_plus_selected_rural`	3
(other directions)		~23

Two distinct biased-side patterns:

Coverage restriction (15 pairs of 49 = 31%): sponsored = urban_only (10 pairs) or specific_neighborhoods (5 pairs), indep more inclusive. The textbook Bayesian-cluster-selection lever for muncipalities with a rural-base opposition or specific neighborhood support pattern.
Coverage over-claim (6 pairs of 49): sponsored = full_muni while indep = urban_plus_selected_rural — sponsored claims wider coverage but neither side covers every bairro. Could indicate declared-vs-actual discrepancy (Channel B fabrication signature).

Plus 2 pairs where the sponsored side is not_realized — the sponsored poll was cancelled/never published yet still entered the registry. Worth cross-tabulating cancellation rates by sponsor type in the full extraction.

Three worked cases

High contrast, classical bias: Porteirinha/MG, JURACI FREIRE MARTINS (PSD)

Final share: 43.8%
Sponsored poll (EXITO CONSULTORIA, n=537): 85.5% predicted, error +41.7pp
Independent (INSTITUTO GERAIS, n=590, 9 days apart): 38.3% predicted, error −5.5pp
Contrast: +47.2pp
Design diffs: sponsored coverage not_specified + non-deferred; indep coverage urban_plus_selected_rural with rural=True. Sponsored bairro_detail says full_municipality (50 bairros) vs indep urban_plus_selected_rural (30 bairros).
Reading: sponsored claims wider coverage in registry text and PDF, but the 41.7pp over-statement is too large to be explained by coverage alone — likely Channel A + Channel B mix.

Lower contrast, mixed frame: representative biased pair (one of 5)

Pattern: sponsored uses population_reference=mixed while same-week indep uses tse_eligible. The sponsored poll's quota distribution can be drawn from any combination of census-resident shares (favors non-voters) and TSE-eligible shares (favors registered voters) without violating either standard. If the sponsor's candidate has stronger support among registered/voting older voters, blending the frames understates the youth share and over-states the candidate.

High contrast NOT classical bias: Caldas Novas/GO, KLEBER LUIZ MARRA (MDB)

Sponsored: SMS-DIRECT, predicts +5.7pp (essentially accurate vs 60.3% final share)
Independent: OPCAO PESQUISAS, predicts −60.3pp (massively misses the landslide)
Contrast: +66.0pp

This is the largest contrast in the sample but NOT a bias finding — the independent poll is the defective one, plausibly because the candidate wasn't in their scenario list. Demonstrates why naive contrast = sponsored − indep mixes "sponsored over-states" with "indep was bad." The clean classification above uses sponsored_error > indep_error AND > 0.

H1 deterministic preview — directional but underpowered (2026-06-02)

Tested error ~ sponsored × mixed_population_reference on the 354 protocols with both extractions cached so far (out of 8,917 in the analysis table — sample is small):

Spec	sponsored	mixed	sp × mixed	n
1 (descriptive cell means: sponsored mean 8.33 / 7.75)	—	—	~−0.6pp	1,255
2 (OLS pooled)	+7.41 (p<.001)	−0.30 (n.s.)	−0.29 (n.s.)	1,255
3 (+ pollster + race FE)	+7.15 (p<.001)	−1.57 (n.s.)	+2.68 (n.s.)	1,255
4 (within-candidate FE)	+6.71 (p<.001)	+1.52 (n.s.)	+4.63 (p=0.21)	1,255

The sponsored coefficient cleanly replicates the +7pp headline across specs. The H1 interaction term, however, is only directionally supportive after candidate FE (Spec 4), and its SE of 3.66pp gives it nowhere near conventional significance. The descriptive cell means (Spec 1) show essentially zero interaction.

Two readings:

a. H1 is a within-race story — the regression effect appears only after demeaning by (muni × candidate), meaning that for the same candidate in the same muni, polls with mixed frame have ~4.6pp higher bias than polls with clean frame. Sample is too thin (14 sponsored × mixed observations) to test cleanly. Full-universe extraction would give power.

b. H1 is a selection artifact — the curated pairs were chosen precisely where design choices differ. In the full universe most polls use tse_eligible regardless of sponsorship, so the eyeball pattern doesn't generalize. Full-universe data adjudicates.

Both readings require more data. The full Batch-API extraction (see docs/todo.md § "Run extract_methodology_batch.py at full 14k scale") gives ~thousands of sponsored × frame combinations, enough to settle.

Output: build/llm/h1_test/specs.json

Three hypotheses for the next round of testing

H	Claim	Test
H1	`population_reference=mixed` × sponsored predicts bias	Pooled OLS on extracted set; full-universe regression after the batch extract; within-candidate FE analog
H2	Coverage restriction (`urban_only` / `specific_neighborhoods`) × sponsored predicts bias, more visible in `bairro_detail` PDF than registration text	Same FE specs; condition on rural-base candidate via prior election seção shares
H3	Quality-control divergence (audit method, consistency checks) is correlated noise, not causal	Robustness: should NOT survive interaction with sponsorship after controlling for H1 + H2

H1 is the cleanest test because population_reference is a single categorical field with a clear null (tse_eligible as the canonical clean choice). H2 is sharper after seção-level vote data is joined — see todo.md § "Bairro/setor oversampling test".

Sources

Pipeline: projects/poll-sponsor-bias/source/llm/curated_pairs_find.py, curated_pairs_extract.py, curated_pairs_assemble.py, h1_population_reference_test.py
Outputs: build/llm/curated_pairs/pairs.parquet (130 pairs), build/llm/curated_pairs/pairs_with_extractions.parquet (130 × 260 cols), build/llm/h1_test/specs.json (when H1 deterministic test runs)
Cross-refs: docs/design_levers.md — the menu of design choices this finding tests against; docs/theory.md § Polls as Bayesian persuasion — the formal model these mechanisms instantiate; docs/thinking.md § Coverage deferral is a feature — related Channel A finding on deferred coverage; docs/todo.md § Bairro/setor oversampling test — the deterministic scale-up that needs seção data

Sponsored polls flex the population frame and the bairro list — Channel A signatures from a 130-pair eyeball

Sponsored polls flex the population frame and the bairro list — Channel A signatures from a 130-pair eyeball

Method

Diff-rate gaps (biased − unbiased, key fields)

Directional patterns

population_reference (20 biased pairs differ; 0 go "the other way")

bairro_detail.coverage_class_resolved (49 biased pairs differ)

Three worked cases

High contrast, classical bias: Porteirinha/MG, JURACI FREIRE MARTINS (PSD)

Lower contrast, mixed frame: representative biased pair (one of 5)

High contrast NOT classical bias: Caldas Novas/GO, KLEBER LUIZ MARRA (MDB)

H1 deterministic preview — directional but underpowered (2026-06-02)

Three hypotheses for the next round of testing

Sources

`population_reference` (20 biased pairs differ; 0 go "the other way")

`bairro_detail.coverage_class_resolved` (49 biased pairs differ)