Repeat (firm × candidate) dyads are dramatically over-concentrated above chance at every route, with the largest gap at CPF: observed repeat-pair share = 56.0 % vs null mean 4.9 % (p < 0.0005, n = 25 protocols, 7 of 25 in repeat pairs); party-name 46.3 % vs 12.0 %; party 32.8 % vs 1.5 %; committee 17.5 % vs 0.4 %. But the firms doing the CPF repeat dyads are NOT in the AN-006 high-β tail — three of the 7 dyads are at AR7 (β −2.4) and one is at CENSUS (β −8.2). The connection between repeat-dyad structure and AN-006's CPF +19 pp slant is therefore not direct: the +19 likely comes from singleton CPF transactions, not from the repeat dyads. M1-individual relational structure exists at CPF but does not obviously carry the slant; the M4 / single-shot pricing reading of the AN-006 CPF tail survives. Time-gap diagnostic for CPF is uninformative (only 1 of 7 dyads has both date_registered values; that one shows a 78-day gap, consistent with separate decisions, but n=1).
Question
AN-006 found within-firm β = +19.12 pp at the CPF route (the candidate paying with their own CPF — the closest the data gets to a direct candidate ↔ pollster individual dyad) vs +6-9 pp at the committee / party / party-name routes (pollster FE in the spec, so this is not a firm-composition effect). After AN-073 ruled out the partisan version of the relational story, the CPF +19 pp is the strongest within-cycle survivor of the M1 / M3 relational hypothesis — but it admits two readings:
- M1-individual. Durable (firm × candidate-CPF) dyads sustain a credible high-slant relationship. The candidate has a relational asset with this firm specifically, and the dyad is a repeat-game fixture.
- M4 / strategic-stake / single-shot. Candidates who pay personally have the strongest stake but no durable relational tie. The pollster charges a premium for the slant and there's no repeat-game structure; the deal is one-shot.
The two predict opposite repeat-dyad structures: M1-individual predicts CPF (firm × candidate) pairs cluster as repeats more than chance; M4 predicts they don't.
Design
source/analysis/an-074-cpf-repeat-dyad.py:
- Load candidate-sponsored rows (1,908), deduplicate to (protocol × institute × candidate-politico-id) trios.
- Stratify by
sponsor_route(cpf, committee, party, party_name). - Per route, compute: n_protocols, n_distinct firms, n_distinct candidates, n_distinct (firm × candidate) pairs, max pair count, repeat-pair share, and the pair-HHI.
- Permutation null: shuffle the firm column, recompute the metrics; 2,000 reps. One-sided p on each metric: P(null ≥ observed) for repeat-share / pair-HHI, P(null ≤ observed) for n_distinct_pairs (fewer pairs = more concentration).
- Cross-cut: per route, distribution of firm-total-volume from
poll_response_2024.parquet— does the CPF route over-represent small-volume firms? (Pollster FE in AN-006 absorbs static firm quality, but the firm-size composition matters for interpreting the CPF tail.)
Results

Per-route concentration
| Route | n_polls | n_firms | n_cands | n_pairs | repeat-pair share | null mean | p |
|---|---|---|---|---|---|---|---|
| CPF | 25 | 14 | 17 | 18 | 56.0 % | 4.9 % | <0.0005 |
| Party-name | 149 | 49 | 97 | 107 | 46.3 % | 12.0 % | <0.0005 |
| Party | 58 | 36 | 43 | 48 | 32.8 % | 1.5 % | <0.0005 |
| Committee | 561 | 166 | 498 | 509 | 17.5 % | 0.4 % | <0.0005 |
| ALL pooled | 793 | 204 | 640 | 674 | 26.5 % | 0.8 % | <0.0005 |
Every route is dramatically above chance. CPF has both the highest raw share (56 %) and the largest absolute gap above the null (51.1 pp). The maximum (firm × candidate) pair count is 2 at CPF (no triple-dyad), 3 at party, 4 at committee, 5 at party-name — the heaviest within-cycle clustering is on the high-volume routes.
Firm-size composition
Median firm-total polls (across all 2024 polls each firm produced):
- Committee 14, CPF 26, Party 27.5, Party-name 100.
The CPF route is not dominated by the smallest firms. Committee polls come from smaller firms on median (median 14 vs CPF 26), so the AN-006 within-firm CPF +19 vs committee +9 is not driven by "CPF candidates hire the smallest, highest-β firms."
Which CPF dyads repeat — and at which firms?
| Firm (within-firm β from AN-016 if available) | # CPF dyads as repeats |
|---|---|
| AR7 PESQUISAS (β = −2.36) | 3 |
| CENSUS INSTITUTO DE PESQUISAS (β = −8.19) | 1 |
| EQUACAO PESQUISAS (no β) | 1 |
| MARIO ELISIO DE MAGALHAES (no β) | 1 |
| PAULISTA JUNIOR PROJETOS & CONSULTORIA (no β) | 1 |
The two firms in the repeat-dyad set that have an AN-016 β both have negative β. The CPF repeat-dyad structure does not sit where the AN-006 CPF +19 pp slant lives — the slanted CPF protocols are predominantly the singleton ones.
Time-gap diagnostic — undercut by data coverage
For 7 CPF repeat dyads, only 1 has both date_registered
values populated in the underlying poll registry (78-day gap,
consistent with separate decisions). The remaining 6 have at least
one missing date. The time-gap test that would distinguish
single-contract tracking polls (short gaps) from M1-individual
re-hires (long gaps) cannot be run at CPF with current data. The
comparator routes are diagnostic and clear:
- Committee route: median gap 11 d, only 12 % > 30 d → typical pattern is single-contract multi-wave tracking polls.
- Party-name route: median gap 102 d, 87 % > 30 d → repeats are time-spread separate decisions.
Interpretation
The repeat-pair concentration is sharp at every route and largest at CPF, which is the first within-cycle quantitative evidence that relational structure exists in the candidate-sponsored segment beyond the partisan unit AN-073 ruled out. But the mapping to AN-006's CPF +19 pp slant is not direct: the firms producing CPF repeat dyads have AN-016 β = −2.4 (AR7) and −8.2 (CENSUS), at or below the AN-016 mean. The +19 within-firm CPF effect from AN-006 lives somewhere else in the CPF cell — most naturally, in the singleton-dyad CPF protocols, where candidates pay personally for a one-shot slant from firms with no durable individual tie.
This is closer to M4 single-shot pricing for the slant itself, with M1-individual structure observable in the repeat-dyad layer but at firms that don't slant. Two structurally different transactions on the same nominal route:
- M1-individual repeats at low-β firms (durable trust → honest delivery, or at minimum no slant — possibly even reverse-slant, per CENSUS / AR7's negative β).
- Singleton CPF transactions at higher-β firms (one-shot premium for the slant).
If true, this would meaningfully reshape the puzzle: the relational-contracting infrastructure exists and the slant exists, but they don't co-locate. M1-individual disciplines (or fails to slant); M4 carries the slant.
Caveats:
- n = 25 CPF protocols, 7 repeat dyads — thin.
- The within-firm CPF +19 from AN-006 has not been decomposed by repeat-vs-singleton subset; the M1-vs-M4 split above is the most natural reading of the joint pattern but is one regression short of a clean test.
- 6 of 7 CPF dyads have incomplete date coverage in the poll registry — the date-gap M1-vs-tracking-contract diagnostic is uninformative at CPF as a result. May indicate the registry date fields are systematically missing for CPF-route protocols (data quality, not a model artifact).
Follow-ups
Decompose AN-006's CPF +19 by repeat-vs-singleton subset (puzzle / extension): the load-bearing next test. Refit the AN-006 spec restricted to (a) singleton CPF dyads only, (b) repeat-dyad CPF protocols only. If singleton β ≫ repeat β, the M4 reading of the +19 is supported and the M1-individual structure is decoupled from the slant. If repeat β ≥ singleton β, the repeat-dyads carry the slant and M1-individual is supported. Cheap re-run of the AN-006 spec; suggested script:
source/analysis/an-NNN-cpf-beta-by-dyad-multiplicity.py.Date-field coverage diagnostic for CPF protocols (puzzle): 6 of 7 CPF repeat dyads have missing
date_registeredon at least one side. Check whetherdate_registeredanddate_endare systematically more missing on CPF-route protocols across the full sponsor parquet — if so, the missingness is a route- specific data-quality signal worth understanding (possibly a registration-form variant). Cheap data check; ~30 min.Same diagnostic on AN-016's β-tail firms (extension): for the high-β small-firm tail (METHODUS, CAMARGO E MEDINA, BRASLOPES, etc. — AN-073's "low-HHI, high-β" cell), tabulate their CPF-route repeat-dyad structure specifically. If those firms have zero CPF repeat dyads, the M4 reading of the +19 gains weight by direct evidence. ~30 min.
2020 cross-cycle dyad recurrence (blind spot): AN-074 is single-cycle. The gold-standard M1-individual test is whether (firm × candidate) dyads recur across 2020 → 2024. Within-cycle repeats can be either separate decisions or a single tracking contract; cross-cycle repeats are unambiguously separate decisions. Parked pending harmonized 2020 sponsor data.
CENSUS / EVA FRANCIELI / AR7 selection diagnostic (puzzle): AN-073, AN-074, and AN-071 all flag the same small group of firms with anomalous β. AR7 with 3 CPF repeat dyads but β = −2.36 is another data point for the suspected- selection story. Already on
todo.md— AN-074 raises its priority further.