An 125 Stronghold Cluster And Cand Prior

**AN-032's headline is a clustering artifact; the corrected geographic-margin test is null at both the party and candidate-own level.** Part A: AN-032 reported Δ = −0.0029 (t = −5.3, p < 10⁻³, 20/22 pairs negative) on the muni-rich subset (n_LV_bairros ≥ 50), but those 22 pairs come from only **2 unique sponsored polls** (HELIO LOPES PSDB-Anápolis, ROGERIO CORREIA PT-BH) each re-paired against 11 independents. With cluster-bootstrap CI by sponsored_protocol the result is null at every threshold where the bootstrap is meaningful: thr=0 (42 pairs, 21 clusters) Δ=−0.12 pp, 95% CI [−0.46, +0.34]; thr=11 (32 pairs, 12 clusters) Δ=−0.19 pp, [−0.68, +0.49]; thr=20 (25 pairs, 5 clusters) Δ=−0.15 pp, [−0.45, +0.90]. Part B: replacing the sponsor's party's 2020 share with the sponsoring candidate's **own** 2020 mayoral share (eligible: 2024 sponsoring candidate also ran for mayor in 2020 in same muni, n=26 cands → 34 usable pairs from 18 unique sponsored polls) gives Δ = +0.15 pp (p = 0.69, sign 13/34, cluster-BB 95% CI [−0.49, +1.22]), with mean candidate-share ~33% — well-powered, also null. The geographic-design margin returns no detectable within-pair effect in either direction, at either the party-level or the sharper candidate-own level.

Hypothesis: bayesian-cluster-selection
Confidence: green
Type: descriptive

Design

Sample: Part A re-uses AN-032's 42-pair usable sample (curated_pairs with bairro extractions on both sides and a sponsoring-candidate party assigned) and clusters at sponsored_protocol. Part B's eligibility requires the 2024 sponsoring candidate to also have run for mayor in 2020 in the same municipality, resolved via pipelines/politica/build/clean/politico.csv (name → politico_id) ∩ votacao_secao_2020.parquet (office=PREFEITO, politico_id present).
Specification: Part A: cluster-bootstrap (5000 reps) by sponsored_protocol on AN-032's contrast = sp_oversample_index − ind_oversample_index, at four LV-richness thresholds (≥ 0, 11, 20, 50). Part B: for each surviving pair, compute the candidate's bairro- weighted 2020 mayoral vote share separately for the sponsored and matched independent poll, using the same bairro-string normalization and matching logic as AN-032 (replacing party_share with cand_share = candidate_votes / total_seção_votes). Paired-t and cluster-BB CI by sponsored_protocol.
Comparator: AN-032 with naive paired-t SE.

Script: source/analysis/an-125-stronghold-cluster-and-cand-prior.py
Target: build/table/stronghold_cluster_and_cand_prior.csv
Status: interpreted · 2026-06-23
Created: 2026-06-23

What this overturns and what it leaves intact

[[an-032]] reported a "reverse-sign result" on the muni-rich subset that fed directly into the paper's §5.1 (paper/paper.tex:549–588). The reported magnitude (Δ = −0.0029, 20/22 pairs negative) was correct as a sample mean, but the inference was wrong: the 22 pairs were two re-paired sponsored polls, so the effective n was 2, not 22. The naive SE was about 5× too small, and once the SE is set right, the headline collapses.

The substantive verdict on the geographic-design margin therefore moves from "reverse-sign result on the cleanest subset" to "no detectable within-pair effect, either direction". This is more consistent with the §5 lede ("the registration system makes two production stages visible … neither accounts for the slant") than the old reading was — the geographic margin joins scenario-rotation, mode substitution, audit-rate, and the other registered design choices as a null.

Method

For each matched race-week pair (sponsored_protocol ⨯ indep_protocol, ±14 days, same muni × same sponsored candidate), take the two polls' declared bairro lists from the registration PDFs.

Part A (re-use AN-032's measure). For each poll, compute a weighted share of the sponsor's-party 2020 mayoral vote across the bairros it sampled, then subtract the muni-wide weighted average to form an oversample index. The within-pair contrast is sponsored index minus independent index. Cluster-bootstrap by sponsored_protocol (the unit at which the bairro list is shared across pairs).

Part B (candidate-own variant). Restrict to pairs where the sponsoring candidate also ran for mayor in 2020 in the same muni (politico_id resolution via the politico.csv registry + 2020 mayoral seção-vote table). For each poll, compute the weighted share of that candidate's own 2020 vote across the bairros it sampled. Paired-t on the within-pair candidate-share difference, with cluster-BB CI by sponsored_protocol.

Results

Part A — cluster-corrected party-level test

n_LV_bairros ≥	n_pairs	n_clusters	Δ (pp)	Naive SE (pp)	Cluster-BB 95% CI (pp)
0	42	21	−0.12	0.14	[−0.46, +0.34]
11	32	12	−0.19	0.18	[−0.68, +0.49]
20	25	5	−0.15	0.09	[−0.45, +0.90]
50	22	2	−0.29	0.06	[−0.52, −0.02]

Zero is inside the cluster-corrected CI at every threshold where the bootstrap is meaningful (n_clusters ≥ 5). The thr=50 row should be disregarded: with only 2 clusters, the bootstrap can only resample those 2 bairro lists, and the CI is artificially narrow (and artificially distant from zero) because it lacks the third draw that would average them out. The substantive read is: the original "p < 10⁻³, 20/22 negative" headline is entirely the product of those 2 bairro lists each entering the paired-t 11 times.

Part B — candidate-own test

26 of 95 unique 2024 sponsoring candidates in the full pairs sample also ran for mayor in 2020 in the same muni (politico_id matched).
34 pairs have usable bairro matches on both sides, drawn from 18 unique sponsored polls (cluster issue largely solved by the eligibility filter).
Mean candidate-own 2020 vote share in sampled bairros: 33.15% sponsored vs 33.00% matched independent. Δ = +0.15 pp (paired t = +0.40, p = 0.69; sign test 13/34 positive, binom p = 0.23; cluster-BB 95% CI [−0.49, +1.22] pp, n_clusters = 18).

The candidate-own test is well-powered (mean share at 33% gives real spread, and 18 clusters is enough for the bootstrap to be informative), and it is also a clean null. The point estimate even runs slightly positive. There is no within-pair signal that sponsored polls field in neighborhoods where the sponsoring candidate herself was stronger in 2020.

Implications

The §5.1 stronghold-oversampling result has to be retracted; the geographic margin joins the inventory of nulls.
The paper's broader §5 thesis ("neither the declared design nor the licensed statistician accounts for the slant; the slant is produced at the unobserved fielding / post-collection stages") is strengthened, not weakened, by removing the one design choice that previously appeared to return a same-direction effect.
AN-032's CSV and figure remain on disk and can still be useful as the measurement of how each pair distributes across bairros — but the headline number should not be cited.

Provenance

This re-analysis was triggered by a paper-validation pass on 2026-06-23. The trigger was a question about the sign of the §5.1 "Δ = −0.0029" claim — investigating the sign led to noticing that the 22-pair subset was 2 sponsored polls in 11 re-pairings each.