Thinking and Open Ideas

Current open questions

Slant via design vs slant via fabrication (Channel A vs B). The headline β ≈ +7 pp is the total sponsor bias. We can't currently decompose it into design-driven slant (Channel A — methodology choices declared in the registration, e.g., urban-only coverage, quota variables that favor base) vs residual / fabrication (Channel B). The decomposition needs the LLM methodology extractor (queued). What's the expected split? Priors point to mostly Channel A given Brazil's high regulatory disclosure burden — but the +6.7 pp jump on a 4–14 day window for the same pollster's poll of the same candidate doesn't mechanically require design changes, so Channel B might be larger than expected.
Why is the +6 to +7 pp magnitude so stable across specs? Within-candidate FE, race × week FE, and the descriptive within-candidate jump all give the same number. This is reassuring but also surprising — different identifying variation across specs. Possibly a sign the effect is genuinely scalar (a few-pp boost applied wholesale), not heterogeneous.
Is Route C undercounting party-sponsored polls? Despesa_partidaria only catches party-directorate CNPJs that filed expenditure forms in 2024. National / state-level party organs that didn't may be missing. Route D (name-pattern parsing) partially compensates but not fully.
Pollster-day FE feasibility. Spec 3c uses (race × week) cells. A tighter version would be (race × day) — but with field periods typically lasting 3-7 days, "day" isn't well-defined for polls. Could use start-date or end-date. Field-period overlap-based matching is more principled but harder to implement cleanly.

Possible directions

Heterogeneity by sponsor type: Route A (candidate's own CPF) vs Route B (committee) vs Route C/D (party). Do candidate-CPF-paid polls show a larger β than party-paid? Identifies who's slanting: the candidate's inner circle vs the party apparatus. Sample is thin for A (~18 matches) but the comparison is sharp if it exists.
Heterogeneity by funding source (DS_ORIGEM_RECURSO): Doações Eleitorais vs Recursos Próprios vs Fundo Partidário. If the slant comes through the candidate's strategic incentives, it should be largest when the candidate's own money is at stake (Recursos Próprios) and smallest when public funds (Fundo Partidário) — where the principal-agent structure makes individual candidate control weakest.
Heterogeneity by race competitiveness: tight races (where the poll matters most for momentum / fundraising) should show larger β than landslides. Final share gap or polling lead at field period as the competitiveness measure.
Heterogeneity by pollster reputation: do reputable national pollsters (Datafolha, Quaest, Ipec) show smaller β than small regional firms? Reputational discipline as a moderator.
Repeat-game version: pollsters that work for the same candidate multiple times — does β grow with repetition? Lock-in vs honest partnership.
Test of the coordination-devices theory (docs/theory.md). The cleanest in-paper test: heterogeneity of β by candidate's final-rank position. Coordination theory predicts β peaks for candidates near the viability cutoff — 2nd-place threshold in small munis (plurality), 3rd-place threshold in runoff-eligible munis (≥200k voters). Three structures to look at:
- Q1: Within each muni-size cell (small vs runoff-eligible), is β larger for borderline candidates than for front-runners or bottom-of-field?
- Q2: Does the rank position of peak β shift at the 200k-voter institutional threshold, in the direction the theory predicts? A clean cross-rank-quintile × runoff-eligibility 2×K plot answers this descriptively before any RD machinery.
- Q3: Is the runoff-eligibility threshold sharp enough for a fuzzy-RD on β-by-rank-position interaction? Need a count of munis near the 200k threshold first; if ~25+ on each side it's worth pursuing.
Caveat the user flagged: every candidate has an incentive to commission slanted polls regardless of position, so the sponsor-side slant decision doesn't necessarily concentrate at the threshold even if the voter-side return does. The heterogeneity test on β is therefore a test of the joint prediction "sponsors commission AND voters respond to coordination signals" — a null result is consistent with either link failing. Sharper tests would need a behavioral measure (e.g., share of voters who switch from below-cutoff to above-cutoff candidates after a poll release) that we don't currently have in the data.
Theories of how polls matter — additional candidates beyond the three in theory.md. Brainstormed 2026-06-02 as motivation + potential robustness tests. Most aren't testable with current data; collected here for future expansion.

Voter-side:
- Mobilization (turnout) effect — your candidate's poll standing affects whether you bother voting. Different empirical object (turnout, not vote share). Needs muni-level turnout × poll timing data. Refs: Sudman 1986, Frey & Schmid 1995.
Supply-side:
- Pollster reputation / career concerns — repeated-game between pollster firm and the market: firm trades long-run accuracy reputation against short-run sponsor satisfaction. Predicts β heterogeneous by pollster size/reputation (big national firms smaller β; small regional firms larger β). Testable with the 448 pollsters in the all-Brazil sample if we group them by annual volume. Refs: Holmström 1999 career concerns.
- Verifiable disclosure / future cost — slant has a future cost because the election eventually verifies. Predicts β should decrease as the election approaches. Directly testable via days_to_election × sponsored_by interaction. Refs: Crawford-Sobel 1982, Milgrom-Roberts 1986.
Mechanism of impact (what polls shape beyond voters):
- Fundraising cascade / donor signaling — donors observe polls before giving; inflated poll → inflated donations → real campaign-resource advantage. The causal target of slant might be donors rather than voters. Needs campaign-finance disbursement data joined to poll timing. Refs: Mutz 1995, Brown & Ramsden 1996.
- Media-coverage allocation — newsrooms allocate attention by poll standing. The 12.5% of EJ injunctions that are poll-related is circumstantial evidence political actors believe this channel matters. Refs: Iyengar & Kinder 1987, Druckman 2005.
- Strategic candidacy / entry-exit — hopeless candidates drop out earlier when polls confirm low standing; viable ones double down. Slant could be the difference between a candidate exiting vs staying in. Heterogeneity by candidate experience.
Brazilian-context-specific:
- Clientelism / vote-buying machinery — Lloyd et al. (2016) directly links poll error to vote-buying mechanisms; polls signal which candidates have machine resources to deliver favors. Reframes the +7 pp from "voter belief manipulation" to "machine resource signaling." Specific to Brazil.
- Coligações / coalition negotiations — in Brazilian multi-party context, pre-election coalitions for vereador slates depend on prefeito poll standing. Poll-induced coalition formation is a separate channel from voter-side effects. Hard to test cleanly with current data.

Connections to literature

Leeper (2019) "Sponsorship effects in polls" — closest direct predecessor. US data, smaller bias magnitudes. The TSE registration regime here gives us the unpublished-poll sample they couldn't observe.
De Stefano et al. (2022) "Pre-electoral polls in multi-party systems" — Italian Bayesian house effects with explicit client-bias parameter. The hierarchical model they specify is the natural Bayesian counterpart to our frequentist FE design — useful as a robustness check / extension.
Lloyd, Turgeon & Gramacho (2016) "Vote buying, undecided voters, and polling error in Brazil" — Brazilian setting, documents non-random poll error with a vote-buying mechanism. Closest Brazilian precedent for the substantive claim "polls aren't randomly off in Brazil."
Batista Pereira et al. (2024) "Pesquisas eleitorais" — uses Quaest data on the same TSE-registered universe. Argues the 2022 polls-vs-results gap is late voter-side change not polling error. Important alternative explanation our sponsor-bias story needs to be distinguished from. The pre-poll trajectory placebo is partial response: a 4-14-day within-candidate jump of +6.7 pp can't be late voter-side change at that time scale.
Bayesian persuasion (Kamenica & Gentzkow 2011) — the natural theoretical hook for Channel A. Sponsor chooses poll design (a signal structure) to maximize posterior belief about their candidate, subject to the registration regime's methodology disclosure constraint. The Bayesian persuasion frame is in summary.md § Mechanism — we should sharpen the formal mapping once Channel A vs B numbers are in.

Methodological sketches

Falsification: media-sponsored polls vs media-sponsored polls of the same race. Within race × week, all pairs of independent polls should show β = 0 by construction (no candidate-sponsor link). Useful as a "is the FE structure picking up anything when there shouldn't be" sanity check.
Spec checking via placebo bins. Randomly reassign sponsored_by to non-treated polls within race × week; β should center on 0 under the null.
Plotting: kernel-density of error by sponsored_by status. The β coefficient is a mean shift, but a full density picture (or CDF) shows whether the shift is concentrated in the upper tail (a few highly-slanted polls) or distributed (every self-sponsored poll shifts by ~7 pp).
Spec 3c sensitivity to cell selection. With 60 cells, leverage is concentrated. Leave-one-cell-out estimates would show whether the +6.95 is driven by a handful of municipalities.

Ideas to explore later

Time trends in sponsor bias: does the gap shrink as the campaign progresses? The campaign final week should see less slant because the bias becomes verifiable against the looming actual result.
Spillover via media coverage: do self-sponsored polls covered in friendly media (mapped via media-outlet ownership) get bigger amplification? This would link sponsor bias to information- manipulation costs.
Second-round simulation scenarios: the segundo_turno_simulacao scenarios in the LLM extract show hypothetical head-to-heads. Do sponsors slant differently in those? They're less directly consequential (different denominator) so the strategic incentives may differ.
2020 mayoral cycle as a pre-registration baseline: same PesqEle regime, different cycle. Replicating the design on 2020 data validates that the +7 pp isn't a 2024-specific phenomenon.

Coverage deferral is a feature, not a bug (2026-06-02)

Empirical finding from the 14,876-poll bulk scan + smoke test: 36.9% of registered polls (5,393) defer geographic coverage at registration time, citing Res 23.600/2019 §7°. Combined with another 19.9% very-short / empty DS_DADO_MUNICIPIO, only 43.2% (6,422 polls) have substantive coverage content.

The bulk export is post-window. DT_GERACAO=17/05/2026, ~18 months after the 2024 elections. Every complementation window has closed. No duplicate rows per protocol (14,876 protocols, 14,876 rows). So the deferral text we see IS the final registry state — either the pollster never complemented, or TSE didn't update the field after complement. The Res 23.600 §7° non-complement penalty ("não registrada") is rarely enforced in practice; the lawsuit pilot saw "não foi complementada com a lista dos bairros" as a frequent petition argument, but most cases were dismissed by perda do objeto.

Pollster-level bimodality. Top-15 firms by volume split sharply: INSTITUTO VERITA (393 polls), MOREIRA & NOLETO (230) are 100% substantive; QUAEST (157 polls), SETA (176), SECULUS (190), INSTITUTO PARANA (251) are ≥90% deferred. QUAEST at 100% deferred is particularly notable given its academic ties (Felipe Nunes, co-founder) — coverage deferral is not a small-firm hygiene issue, it's how a tier-1 brand handles registration.

Recovery options ranked by feasibility:

Cluster grain from DS_PLANO_AMOSTRAL: 44% of deferred polls (2,372) mention geographic vocabulary ("setores censitários", "bairros", "zona urbana") somewhere in the sampling-plan text — typically describing the grain of the cluster sample without naming specific units. Recovers "polled at setor-censitário level" but not which setores. Already in scope of poll_sampling.
TSE individual-protocol pages: sig.tse.jus.br/ords/dwapr/r/seai/ sig-pesquisas/ may carry complement metadata not in the bulk export. Unknown payoff until a 10-page spot check confirms. ~5.4k page fetches if signal exists.
Relatório PDFs: poll_relatorio.py already extracts these for per-candidate vote intentions; might also carry coverage narrative. Probably moot — pollsters who deferred at registration likely published with similarly vague language.

Update 2026-06-02 — better recovery path found. Spot-check of TSE endpoints revealed that the CKAN package pesquisas-eleitorais-2024 on dadosabertos.tse.jus.br has 6 resources, not the 3 we use. The missing ones include bairro_municipio_2024.zip — explicitly described in the package metadata as "Detalhamento de bairro/município" — which IS the complement that the deferral language pointed to. Plus questionario_pesquisa_2024.zip (PDF questionnaires) and nota_fiscal_2024.zip (invoices). All three at https://cdn.tse.jus.br/estatistica/sead/odsele/pesquisa_eleitoral/.

CDN is HTTP-403 from sandbox (confirms TSE-CDN-blocked memory note), so the mirror has to happen from a host environment. Once mirrored, the deferred-coverage problem becomes a 5,393-PDF extraction task (see todo.md § "Extract bairro/município PDFs for deferred polls"). Likely upgrades the recommended-paper framing in this section: deferral is a Channel A covariate AND we can resolve the actual coverage in most cases. Two informative variables, not one.

The questionnaire PDFs also rescue the question_wording_order lever from "essentially unobservable" (7.7% mention in registration text) to "structurally extractable" — see todo.md § "Extract question_wording_order from questionario_pesquisa PDFs". This was the single weakest field in the smoke test; now it has a path.

Update 2026-06-02 — zip mirrored, recovery confirmed empirically. After bi-dropbox mirror, cross-tab of mayor universe × bairro PDF presence shows:

Bucket	n	has PDF	recovery
deferred_complement	5,489	5,180	94.4%
very_short	2,964	2,839	95.8%
empty	6	6	100%
substantive	6,417	5,173	80.6%
all mayor	14,876	13,198	88.7%

Only 309 deferred polls (5.6%) are truly lost (deferred AND no PDF). The 20% of substantive polls without a PDF probably had complete coverage info in DS_DADO_MUNICIPIO so no complement was filed.

Bairro extractor pilot (20 PDFs, 100% valid post-fix) at projects/poll-sponsor-bias/build/llm/bairro_detail_pilot/. Findings:

16/20 full_municipality, 1 urban_plus_selected_rural, 1 specific_neighborhoods, 1 not_realized (the "PESQUISA NÃO REALIZADA" stamp the schema flags as its own bucket), 1 different pollster format (per-interview microdata).
13/15 polls with QT vs PDF total_interviews comparable: exact match — strong agreement between registry sample size and PDF total. The 2 differs (SP073312024 600 vs 480; AC055932024 904 vs 800) suggest the PDF reports a subset, possibly the urban part of a mixed urban+rural sample. Worth investigating after the full extract — could be a sponsor-bias signal in itself (declared n vs actually-distributed n).
One pollster format edge case (PE032172024, per-questionnaire setor microdata, 1,204 lines) initially overflowed the 4000-token output cap; fixed via 16k max_tokens + 50-bairro hard cap + microdata-format guidance in the prompt.

Recommended paper framing. Treat coverage_class= deferred_complement as a Channel A covariate, not a missingness problem. Two regressions become interesting:

Does sponsorship predict deferral? Within-candidate FE: do self-sponsored polls disproportionately defer vs independent-media polls of the same candidate in the same race × week?
Does deferral predict bias magnitude? Among polls of the same candidate same week, do deferred-coverage ones show a larger sponsor-favoring tilt than substantive-coverage ones?

Both use deferral itself as the design choice, which is what it is. The 36.9% prevalence makes the test well-powered. If sponsors disproportionately choose pollsters who default to the boilerplate, that's already informative — a deferred coverage decision is strategic in expectation, even when no specific bairros are listed.

A subtler interpretation worth keeping in mind: the bimodality suggests deferral is a pollster-level technology choice, not a per-poll strategic one. If true, the within-pollster FE will wipe out the deferral × bias correlation. The interesting variation may be across pollsters — sponsors selecting deferred-coverage pollsters — rather than within-pollster choice to defer. Check by comparing β across pollster FE vs no-pollster FE specifications.

Methodology dedupe gain is smaller than expected (2026-06-02)

The poll-methodology extract pipeline assumed pollsters reuse boilerplate across polls — initial todo entry estimated "a few hundred unique pollster × template combinations" across 14k polls. The 200-poll dry-run shows otherwise:

poll_sampling: 100% unique texts (200/200).
poll_operations: 95% unique texts (190/200).
poll_coverage (substantive subset): 95% unique (74/78).

Reason: pollsters parameterize quota numbers and population references by município. The "fixed template" surface is small; the per-município content is large enough to dominate the hash. Implication: dedupe gain is marginal. Full-universe LLM cost at gpt-4o-mini is ~$25–30, not the $5 the original todo estimated. Still cheap enough that this isn't a blocker.

Miscellaneous notes

Need to keep an eye on the politica nome_urna rebuild — the patch is in place but the cleaned candidato.csv on educloud may need a re-run to populate the column for the next round of analysis.
The educloud sandbox quirks (read-only venv, TSE CDN 403, dropbox as data conduit) are recorded in my memory file reference_educloud_sandbox_env.md. Worth folding into docs/notes/ if other sessions hit them.
The educloud_next_steps.md playbook in docs/notes/ was the original handoff document from the laptop session. It's now mostly superseded by docs/done.md but kept for the audit trail.

Residual decomposition of the +7 pp (2026-06-14)

By this point the structural Channel A search has tested most of the levers visible in the TSE registration text (AN-019, AN-021, AN-022, AN-024, AN-032, AN-041, AN-042, AN-043, AN-051, AN-055, AN-056, AN-057, AN-058). The honest accounting suggests we may be looking in the wrong place for the bulk of the +7 pp — and that scaling the LLM extraction to the universe is a documentation upgrade, not a discovery path. This section lays out the residual decomposition explicitly so the next iteration can target the unexplained part, not re-pound the explained part.

What the structural search has plausibly attributed

Generous upper bounds (i.e. assuming the small positive signals at p < 0.10 are real and operate at their estimated effect sizes):

Lever (AN)	Plausible contribution to +7 pp	Why bounded
Scenario rotation under-doc (AN-051)	1-2 pp	Order/rotation effects in survey-methods literature top out near this magnitude even when actively manipulated
Population reference mismatch (AN-056)	0-1 pp	Frame differences (TSE-elig vs IBGE) shift composition by a few percent, not tens
Ponderação description + post-strat (AN-057, AN-058)	0-2 pp	Selective-disclosure direction; AN-058's income subset hints at claimed correction being effective on income, capping the surviving magnitude
Subtotal	1-5 pp
Unexplained residual	2-6 pp	What scaling LLM extraction will likely not surface

Untested hypotheses (where the residual likely lives)

Six categories of mechanism that the structural extractors cannot see by design — registration text doesn't describe them:

Tabulation-stage manipulation / sophisticated fabrication. AN-013 ruled out crude post-fielding digit-frequency signatures. AN-013 v2 (2026-06-14) then ran the three blind-spot tests: standardised-error variance ("too clean" test) returned null (Levene p = 0.12; sponsored polls have plenty of spread); bias concentration returned the anti-fabrication direction (sponsored 11.5 % within ±2pp vs indep 19.0 %, z = −4.75, p < 0.0001); within-firm tenths-digit TVD elevated (mean 0.39) but most plausibly explained by customer-specific reporting formats / scenario differences rather than tampering.
- Plausible magnitude (revised down): 0-2 pp. The simple "+7 pp fabricated" story is structurally inconsistent with T1 + T2.
- Testability: weak; AN-013 v2's portfolio is the best available with registered data.
~~Firm-level slant-for-hire selection.~~ RULED OUT by AN-059 (2026-06-14). Variance decomposition of headline +7 pp: within-firm = +7.98 pp (102 % of headline); between-firm selection = −0.13 pp (~0). The slant happens WITHIN firm, not via sponsors hiring high-baseline-error firms. AN-016's within-firm β dispersion (sd 10.3) and AN-018's size-discipline pattern still hold across firms; what AN-059 rules out is sponsors concentrating on the high-slant firms in a way that would let firm selection account for any meaningful share of the +7 pp. The 2-4 pp prior magnitude estimate is now 0. The residual reallocation must come from categories #1, #3-#6.
~~Wave selection within a firm × race.~~ Already addressed by AN-003 (pre-poll trajectory placebo). The placebo restricts the comparator to the most recent prior poll of the same candidate in the same race, with comparator pool = independent media OR the pollster itself. Same-firm pollster-self polls are therefore in the comparator pool. The placebo magnitudes survive tightening to ≤14 days and ≤7 days windows ($+\DescPlaceboShortJump$ pp / $+\DescPlaceboSevenJump$ pp). Wave-selection within firm × race would require sponsors to systematically pick favourable moments within a window short enough that genuine campaign drift is small — but the placebo shows the magnitude is essentially the same at 7 days as it is at 14 or median 10. Wave selection cannot explain a stable +7 pp that holds across these windows.
Sample frame contamination. Recruitment from party-affiliated lists, donor databases, attendees at rallies, geocoded campaign reach. The TSE registration would just say "amostragem por quotas" without revealing the underlying sample frame.
- Plausible magnitude: 1-4 pp. A 5-10 % over-representation of party-affiliated respondents in the frame would produce this size effect.
- Testability: weak from registered data alone. Anecdotal evidence + the EJ poll-lawsuit corpus would be the place to look (some lawsuits allege exactly this).
Interviewer scripting / pacing. Subtle cues during the interview that bias responses toward the sponsor. Not in registration text.
- Plausible magnitude: 0-2 pp. Survey-methods literature puts interviewer effects at 1-3 pp in well-studied contexts.
- Testability: very weak from registered data; would need interviewer-level identifiers + cross-firm interviewer mobility.
Strategic timing relative to news events. "Data de campo" recurred 5× in the blinded LLM-judge brief. Sponsors may time fieldwork to catch favourable news cycles (post-rally, post-debate, post-attack-on-opponent).
- Substantially constrained by Spec 3c (race × week FE): β = +7.94 pp on 448 race-week cells where both a sponsored and an independent comparator poll co-exist. A common event bump would lift both sides and cancel; that the coefficient survives at headline magnitude on this tight temporal control argues that event-timing within ±7 days is not the mechanism. The AN-003 placebo (prior-poll only) does not on its own bracket post-event timing, but Spec 3c does — both sides can field on either side of the sponsored poll within the week. (Noted 2026-06-14.)
- Remaining loophole: race-week cells where sponsors actively avoid co-presence with an indep comparator are not in Spec 3c's identifying sample, so event-timing in those cells is untested. Plausible but requires sponsor coordination across firms.
- Plausible magnitude (revised down): 0-1 pp.
- Testability of the loophole: needs event-database join + a "first-poll-in-race" subsample where Spec 3c can't bracket.

Three-way categorisation by testability

Testable with existing data: firm-level selection (#2), wave selection (#3), some fabrication forensics (#1).
Testable with modest new data: strategic timing (#6) if we build an event database from EJ + a news source.
Effectively untestable with registered TSE data: sample frame contamination (#4), interviewer scripting (#5).

Implication for universe-scale LLM extraction

The cancelled poll_sampling + poll_operations batches now have a substantive justification: they unlock AN-056's population-reference test at universe scale (p=0.12 → likely clean p<0.001) and AN-057's ponderação test (p=0.04 → tighter CIs). But neither finds the residual 2-6 pp. The right time to scale is after the firm-selection decomposition (which we can do now with no new extraction) — that decomposition will tell us how much of the +7 pp the LLM extractions are even eligible to explain.

Concrete next tests, in priority order

~~Firm-level decomposition of +7 pp.~~ DONE — AN-059 (2026-06-14). Between-firm component is essentially zero (−0.13 pp); within-firm is the entire +7.85 pp. Sponsor selection of firms is not the mechanism. The residual must live in categories #1, #3-#6 above.
~~Wave-selection test.~~ Already covered by AN-003 (pre-poll trajectory placebo). Same-firm pollster-self polls are in the comparator pool, and the magnitude survives tightening the gap to ≤7 days. Wave selection cannot account for a +7 pp that holds at that window.
Stronger fabrication forensics on candidate vote shares (highest-priority remaining test). AN-013 v2 with the test family expanded to Benford on totals, posterior vs Dirichlet under declared sample design, within-firm rounding-pattern consistency, etc. Low power per test but cheap to run; a portfolio could rule in or rule out sample-design-consistent fabrication.
Universe-scale LLM extraction is now even less attractive. With firm selection zeroed out by AN-059, the residual 2-6 pp is structurally inaccessible to extraction-from-registration-text. Scaling poll_weighting tightens CIs on the AN-057 (+0.04 p) signal but doesn't address the residual. Defer indefinitely unless we get a positive signal from (2) or (3) that the extractor would sharpen.

Open Questions & Ideas