Repeat (firm × candidate) dyads are dramatically over-concentrated above chance at every route, with the largest gap at CPF: observed repeat-pair share = 56.0 % vs null mean 4.9 % (p < 0.0005, n = 25 protocols, 7 of 25 in repeat pairs); party-name 46.3 % vs 12.0 %; party 32.8 % vs 1.5 %; committee 17.5 % vs 0.4 %. But the firms doing the CPF repeat dyads are NOT in the AN-006 high-β tail — three of the 7 dyads are at AR7 (β −2.4) and one is at CENSUS (β −8.2). The connection between repeat-dyad structure and AN-006's CPF +19 pp slant is therefore not direct: the +19 likely comes from singleton CPF transactions, not from the repeat dyads. M1-individual relational structure exists at CPF but does not obviously carry the slant; the M4 / single-shot pricing reading of the AN-006 CPF tail survives. Time-gap diagnostic for CPF is uninformative (only 1 of 7 dyads has both date_registered values; that one shows a 78-day gap, consistent with separate decisions, but n=1).

Confidence
yellow
Type
descriptive
Design
Sample
candidate-sponsored mayoral protocols in 2024 (sponsor_candidate_party non-null on poll_sponsor_2024.parquet), deduplicated to (protocol, institute, sponsor_candidate_politico_id), stratified by sponsor_route
Specification
per-route repeat-pair share = share of polls in (firm × candidate) pairs with multiplicity >= 2; pair-HHI on (firm × candidate) shares; permutation null shuffles the firm column holding firm and candidate marginals fixed (2,000 reps)
Comparator
permutation-randomization of firm-to-candidate pairing within route
Notes
Discriminator for AN-006's CPF +19 pp finding: if CPF-route (firm × candidate) pairs cluster as repeats more than chance, M1-individual relational story is supported; if no clustering above chance, M4 / strategic-stake / single-shot premium is the more parsimonious reading. Candidate identity = sponsor_candidate_politico_id (route-agnostic). Pooled ALL-routes comparator also computed.
Script
source/analysis/an-074-cpf-repeat-dyad.py
Target
build/table/an-074-cpf-repeat-dyad.csv
Status
interpreted · 2026-06-16
Created
2026-06-16

Question

AN-006 found within-firm β = +19.12 pp at the CPF route (the candidate paying with their own CPF — the closest the data gets to a direct candidate ↔ pollster individual dyad) vs +6-9 pp at the committee / party / party-name routes (pollster FE in the spec, so this is not a firm-composition effect). After AN-073 ruled out the partisan version of the relational story, the CPF +19 pp is the strongest within-cycle survivor of the M1 / M3 relational hypothesis — but it admits two readings:

The two predict opposite repeat-dyad structures: M1-individual predicts CPF (firm × candidate) pairs cluster as repeats more than chance; M4 predicts they don't.

Design

source/analysis/an-074-cpf-repeat-dyad.py:

  1. Load candidate-sponsored rows (1,908), deduplicate to (protocol × institute × candidate-politico-id) trios.
  2. Stratify by sponsor_route (cpf, committee, party, party_name).
  3. Per route, compute: n_protocols, n_distinct firms, n_distinct candidates, n_distinct (firm × candidate) pairs, max pair count, repeat-pair share, and the pair-HHI.
  4. Permutation null: shuffle the firm column, recompute the metrics; 2,000 reps. One-sided p on each metric: P(null ≥ observed) for repeat-share / pair-HHI, P(null ≤ observed) for n_distinct_pairs (fewer pairs = more concentration).
  5. Cross-cut: per route, distribution of firm-total-volume from poll_response_2024.parquet — does the CPF route over-represent small-volume firms? (Pollster FE in AN-006 absorbs static firm quality, but the firm-size composition matters for interpreting the CPF tail.)

Results

Repeat-pair share by sponsor route

Per-route concentration

Route n_polls n_firms n_cands n_pairs repeat-pair share null mean p
CPF 25 14 17 18 56.0 % 4.9 % <0.0005
Party-name 149 49 97 107 46.3 % 12.0 % <0.0005
Party 58 36 43 48 32.8 % 1.5 % <0.0005
Committee 561 166 498 509 17.5 % 0.4 % <0.0005
ALL pooled 793 204 640 674 26.5 % 0.8 % <0.0005

Every route is dramatically above chance. CPF has both the highest raw share (56 %) and the largest absolute gap above the null (51.1 pp). The maximum (firm × candidate) pair count is 2 at CPF (no triple-dyad), 3 at party, 4 at committee, 5 at party-name — the heaviest within-cycle clustering is on the high-volume routes.

Firm-size composition

Median firm-total polls (across all 2024 polls each firm produced):

The CPF route is not dominated by the smallest firms. Committee polls come from smaller firms on median (median 14 vs CPF 26), so the AN-006 within-firm CPF +19 vs committee +9 is not driven by "CPF candidates hire the smallest, highest-β firms."

Which CPF dyads repeat — and at which firms?

Firm (within-firm β from AN-016 if available) # CPF dyads as repeats
AR7 PESQUISAS (β = −2.36) 3
CENSUS INSTITUTO DE PESQUISAS (β = −8.19) 1
EQUACAO PESQUISAS (no β) 1
MARIO ELISIO DE MAGALHAES (no β) 1
PAULISTA JUNIOR PROJETOS & CONSULTORIA (no β) 1

The two firms in the repeat-dyad set that have an AN-016 β both have negative β. The CPF repeat-dyad structure does not sit where the AN-006 CPF +19 pp slant lives — the slanted CPF protocols are predominantly the singleton ones.

Time-gap diagnostic — undercut by data coverage

For 7 CPF repeat dyads, only 1 has both date_registered values populated in the underlying poll registry (78-day gap, consistent with separate decisions). The remaining 6 have at least one missing date. The time-gap test that would distinguish single-contract tracking polls (short gaps) from M1-individual re-hires (long gaps) cannot be run at CPF with current data. The comparator routes are diagnostic and clear:

Interpretation

The repeat-pair concentration is sharp at every route and largest at CPF, which is the first within-cycle quantitative evidence that relational structure exists in the candidate-sponsored segment beyond the partisan unit AN-073 ruled out. But the mapping to AN-006's CPF +19 pp slant is not direct: the firms producing CPF repeat dyads have AN-016 β = −2.4 (AR7) and −8.2 (CENSUS), at or below the AN-016 mean. The +19 within-firm CPF effect from AN-006 lives somewhere else in the CPF cell — most naturally, in the singleton-dyad CPF protocols, where candidates pay personally for a one-shot slant from firms with no durable individual tie.

This is closer to M4 single-shot pricing for the slant itself, with M1-individual structure observable in the repeat-dyad layer but at firms that don't slant. Two structurally different transactions on the same nominal route:

If true, this would meaningfully reshape the puzzle: the relational-contracting infrastructure exists and the slant exists, but they don't co-locate. M1-individual disciplines (or fails to slant); M4 carries the slant.

Caveats:

Follow-ups

  1. Decompose AN-006's CPF +19 by repeat-vs-singleton subset (puzzle / extension): the load-bearing next test. Refit the AN-006 spec restricted to (a) singleton CPF dyads only, (b) repeat-dyad CPF protocols only. If singleton β ≫ repeat β, the M4 reading of the +19 is supported and the M1-individual structure is decoupled from the slant. If repeat β ≥ singleton β, the repeat-dyads carry the slant and M1-individual is supported. Cheap re-run of the AN-006 spec; suggested script: source/analysis/an-NNN-cpf-beta-by-dyad-multiplicity.py.

  2. Date-field coverage diagnostic for CPF protocols (puzzle): 6 of 7 CPF repeat dyads have missing date_registered on at least one side. Check whether date_registered and date_end are systematically more missing on CPF-route protocols across the full sponsor parquet — if so, the missingness is a route- specific data-quality signal worth understanding (possibly a registration-form variant). Cheap data check; ~30 min.

  3. Same diagnostic on AN-016's β-tail firms (extension): for the high-β small-firm tail (METHODUS, CAMARGO E MEDINA, BRASLOPES, etc. — AN-073's "low-HHI, high-β" cell), tabulate their CPF-route repeat-dyad structure specifically. If those firms have zero CPF repeat dyads, the M4 reading of the +19 gains weight by direct evidence. ~30 min.

  4. 2020 cross-cycle dyad recurrence (blind spot): AN-074 is single-cycle. The gold-standard M1-individual test is whether (firm × candidate) dyads recur across 2020 → 2024. Within-cycle repeats can be either separate decisions or a single tracking contract; cross-cycle repeats are unambiguously separate decisions. Parked pending harmonized 2020 sponsor data.

  5. CENSUS / EVA FRANCIELI / AR7 selection diagnostic (puzzle): AN-073, AN-074, and AN-071 all flag the same small group of firms with anomalous β. AR7 with 3 CPF repeat dyads but β = −2.36 is another data point for the suspected- selection story. Already on todo.md — AN-074 raises its priority further.