AN-074: CPF cell repeat-dyad test — M1-individual vs M4 single-shot

Repeat (firm × candidate) dyads are dramatically over-concentrated above chance at every route, with the largest gap at CPF: observed repeat-pair share = 56.0 % vs null mean 4.9 % (p < 0.0005, n = 25 protocols, 7 of 25 in repeat pairs); party-name 46.3 % vs 12.0 %; party 32.8 % vs 1.5 %; committee 17.5 % vs 0.4 %. But the firms doing the CPF repeat dyads are NOT in the AN-006 high-β tail — three of the 7 dyads are at AR7 (β −2.4) and one is at CENSUS (β −8.2). The connection between repeat-dyad structure and AN-006's CPF +19 pp slant is therefore not direct: the +19 likely comes from singleton CPF transactions, not from the repeat dyads. M1-individual relational structure exists at CPF but does not obviously carry the slant; the M4 / single-shot pricing reading of the AN-006 CPF tail survives. Time-gap diagnostic for CPF is uninformative (only 1 of 7 dyads has both date_registered values; that one shows a 78-day gap, consistent with separate decisions, but n=1).

Confidence: yellow
Type: descriptive

Design

Sample: candidate-sponsored mayoral protocols in 2024 (sponsor_candidate_party non-null on poll_sponsor_2024.parquet), deduplicated to (protocol, institute, sponsor_candidate_politico_id), stratified by sponsor_route
Specification: per-route repeat-pair share = share of polls in (firm × candidate) pairs with multiplicity >= 2; pair-HHI on (firm × candidate) shares; permutation null shuffles the firm column holding firm and candidate marginals fixed (2,000 reps)
Comparator: permutation-randomization of firm-to-candidate pairing within route
Notes: Discriminator for AN-006's CPF +19 pp finding: if CPF-route (firm × candidate) pairs cluster as repeats more than chance, M1-individual relational story is supported; if no clustering above chance, M4 / strategic-stake / single-shot premium is the more parsimonious reading. Candidate identity = sponsor_candidate_politico_id (route-agnostic). Pooled ALL-routes comparator also computed.

Script: source/analysis/an-074-cpf-repeat-dyad.py
Target: build/table/an-074-cpf-repeat-dyad.csv
Status: interpreted · 2026-06-16
Created: 2026-06-16

Question

AN-006 found within-firm β = +19.12 pp at the CPF route (the candidate paying with their own CPF — the closest the data gets to a direct candidate ↔ pollster individual dyad) vs +6-9 pp at the committee / party / party-name routes (pollster FE in the spec, so this is not a firm-composition effect). After AN-073 ruled out the partisan version of the relational story, the CPF +19 pp is the strongest within-cycle survivor of the M1 / M3 relational hypothesis — but it admits two readings:

M1-individual. Durable (firm × candidate-CPF) dyads sustain a credible high-slant relationship. The candidate has a relational asset with this firm specifically, and the dyad is a repeat-game fixture.
M4 / strategic-stake / single-shot. Candidates who pay personally have the strongest stake but no durable relational tie. The pollster charges a premium for the slant and there's no repeat-game structure; the deal is one-shot.

The two predict opposite repeat-dyad structures: M1-individual predicts CPF (firm × candidate) pairs cluster as repeats more than chance; M4 predicts they don't.

Design

source/analysis/an-074-cpf-repeat-dyad.py:

Load candidate-sponsored rows (1,908), deduplicate to (protocol × institute × candidate-politico-id) trios.
Stratify by sponsor_route (cpf, committee, party, party_name).
Per route, compute: n_protocols, n_distinct firms, n_distinct candidates, n_distinct (firm × candidate) pairs, max pair count, repeat-pair share, and the pair-HHI.
Permutation null: shuffle the firm column, recompute the metrics; 2,000 reps. One-sided p on each metric: P(null ≥ observed) for repeat-share / pair-HHI, P(null ≤ observed) for n_distinct_pairs (fewer pairs = more concentration).
Cross-cut: per route, distribution of firm-total-volume from poll_response_2024.parquet — does the CPF route over-represent small-volume firms? (Pollster FE in AN-006 absorbs static firm quality, but the firm-size composition matters for interpreting the CPF tail.)

Results

Repeat-pair share by sponsor route

Per-route concentration

Route	n_polls	n_firms	n_cands	n_pairs	repeat-pair share	null mean	p
CPF	25	14	17	18	56.0 %	4.9 %	<0.0005
Party-name	149	49	97	107	46.3 %	12.0 %	<0.0005
Party	58	36	43	48	32.8 %	1.5 %	<0.0005
Committee	561	166	498	509	17.5 %	0.4 %	<0.0005
ALL pooled	793	204	640	674	26.5 %	0.8 %	<0.0005

Every route is dramatically above chance. CPF has both the highest raw share (56 %) and the largest absolute gap above the null (51.1 pp). The maximum (firm × candidate) pair count is 2 at CPF (no triple-dyad), 3 at party, 4 at committee, 5 at party-name — the heaviest within-cycle clustering is on the high-volume routes.

Firm-size composition

Median firm-total polls (across all 2024 polls each firm produced):

Committee 14, CPF 26, Party 27.5, Party-name 100.

The CPF route is not dominated by the smallest firms. Committee polls come from smaller firms on median (median 14 vs CPF 26), so the AN-006 within-firm CPF +19 vs committee +9 is not driven by "CPF candidates hire the smallest, highest-β firms."

Which CPF dyads repeat — and at which firms?

Firm (within-firm β from AN-016 if available)	# CPF dyads as repeats
AR7 PESQUISAS (β = −2.36)	3
CENSUS INSTITUTO DE PESQUISAS (β = −8.19)	1
EQUACAO PESQUISAS (no β)	1
MARIO ELISIO DE MAGALHAES (no β)	1
PAULISTA JUNIOR PROJETOS & CONSULTORIA (no β)	1

The two firms in the repeat-dyad set that have an AN-016 β both have negative β. The CPF repeat-dyad structure does not sit where the AN-006 CPF +19 pp slant lives — the slanted CPF protocols are predominantly the singleton ones.

Time-gap diagnostic — undercut by data coverage

For 7 CPF repeat dyads, only 1 has both date_registered values populated in the underlying poll registry (78-day gap, consistent with separate decisions). The remaining 6 have at least one missing date. The time-gap test that would distinguish single-contract tracking polls (short gaps) from M1-individual re-hires (long gaps) cannot be run at CPF with current data. The comparator routes are diagnostic and clear:

Committee route: median gap 11 d, only 12 % > 30 d → typical pattern is single-contract multi-wave tracking polls.
Party-name route: median gap 102 d, 87 % > 30 d → repeats are time-spread separate decisions.

Interpretation

The repeat-pair concentration is sharp at every route and largest at CPF, which is the first within-cycle quantitative evidence that relational structure exists in the candidate-sponsored segment beyond the partisan unit AN-073 ruled out. But the mapping to AN-006's CPF +19 pp slant is not direct: the firms producing CPF repeat dyads have AN-016 β = −2.4 (AR7) and −8.2 (CENSUS), at or below the AN-016 mean. The +19 within-firm CPF effect from AN-006 lives somewhere else in the CPF cell — most naturally, in the singleton-dyad CPF protocols, where candidates pay personally for a one-shot slant from firms with no durable individual tie.

This is closer to M4 single-shot pricing for the slant itself, with M1-individual structure observable in the repeat-dyad layer but at firms that don't slant. Two structurally different transactions on the same nominal route:

M1-individual repeats at low-β firms (durable trust → honest delivery, or at minimum no slant — possibly even reverse-slant, per CENSUS / AR7's negative β).
Singleton CPF transactions at higher-β firms (one-shot premium for the slant).

If true, this would meaningfully reshape the puzzle: the relational-contracting infrastructure exists and the slant exists, but they don't co-locate. M1-individual disciplines (or fails to slant); M4 carries the slant.

Caveats:

n = 25 CPF protocols, 7 repeat dyads — thin.
The within-firm CPF +19 from AN-006 has not been decomposed by repeat-vs-singleton subset; the M1-vs-M4 split above is the most natural reading of the joint pattern but is one regression short of a clean test.
6 of 7 CPF dyads have incomplete date coverage in the poll registry — the date-gap M1-vs-tracking-contract diagnostic is uninformative at CPF as a result. May indicate the registry date fields are systematically missing for CPF-route protocols (data quality, not a model artifact).

Follow-ups

Decompose AN-006's CPF +19 by repeat-vs-singleton subset (puzzle / extension): the load-bearing next test. Refit the AN-006 spec restricted to (a) singleton CPF dyads only, (b) repeat-dyad CPF protocols only. If singleton β ≫ repeat β, the M4 reading of the +19 is supported and the M1-individual structure is decoupled from the slant. If repeat β ≥ singleton β, the repeat-dyads carry the slant and M1-individual is supported. Cheap re-run of the AN-006 spec; suggested script: source/analysis/an-NNN-cpf-beta-by-dyad-multiplicity.py.
Date-field coverage diagnostic for CPF protocols (puzzle): 6 of 7 CPF repeat dyads have missing date_registered on at least one side. Check whether date_registered and date_end are systematically more missing on CPF-route protocols across the full sponsor parquet — if so, the missingness is a route- specific data-quality signal worth understanding (possibly a registration-form variant). Cheap data check; ~30 min.
Same diagnostic on AN-016's β-tail firms (extension): for the high-β small-firm tail (METHODUS, CAMARGO E MEDINA, BRASLOPES, etc. — AN-073's "low-HHI, high-β" cell), tabulate their CPF-route repeat-dyad structure specifically. If those firms have zero CPF repeat dyads, the M4 reading of the +19 gains weight by direct evidence. ~30 min.
2020 cross-cycle dyad recurrence (blind spot): AN-074 is single-cycle. The gold-standard M1-individual test is whether (firm × candidate) dyads recur across 2020 → 2024. Within-cycle repeats can be either separate decisions or a single tracking contract; cross-cycle repeats are unambiguously separate decisions. Parked pending harmonized 2020 sponsor data.
CENSUS / EVA FRANCIELI / AR7 selection diagnostic (puzzle): AN-073, AN-074, and AN-071 all flag the same small group of firms with anomalous β. AR7 with 3 CPF repeat dyads but β = −2.36 is another data point for the suspected- selection story. Already on todo.md — AN-074 raises its priority further.