Thinking and Open Ideas

Current open questions

Possible directions

Connections to literature

Methodological sketches

Ideas to explore later

Coverage deferral is a feature, not a bug (2026-06-02)

Empirical finding from the 14,876-poll bulk scan + smoke test: 36.9% of registered polls (5,393) defer geographic coverage at registration time, citing Res 23.600/2019 §7°. Combined with another 19.9% very-short / empty DS_DADO_MUNICIPIO, only 43.2% (6,422 polls) have substantive coverage content.

The bulk export is post-window. DT_GERACAO=17/05/2026, ~18 months after the 2024 elections. Every complementation window has closed. No duplicate rows per protocol (14,876 protocols, 14,876 rows). So the deferral text we see IS the final registry state — either the pollster never complemented, or TSE didn't update the field after complement. The Res 23.600 §7° non-complement penalty ("não registrada") is rarely enforced in practice; the lawsuit pilot saw "não foi complementada com a lista dos bairros" as a frequent petition argument, but most cases were dismissed by perda do objeto.

Pollster-level bimodality. Top-15 firms by volume split sharply: INSTITUTO VERITA (393 polls), MOREIRA & NOLETO (230) are 100% substantive; QUAEST (157 polls), SETA (176), SECULUS (190), INSTITUTO PARANA (251) are ≥90% deferred. QUAEST at 100% deferred is particularly notable given its academic ties (Felipe Nunes, co-founder) — coverage deferral is not a small-firm hygiene issue, it's how a tier-1 brand handles registration.

Recovery options ranked by feasibility:

  1. Cluster grain from DS_PLANO_AMOSTRAL: 44% of deferred polls (2,372) mention geographic vocabulary ("setores censitários", "bairros", "zona urbana") somewhere in the sampling-plan text — typically describing the grain of the cluster sample without naming specific units. Recovers "polled at setor-censitário level" but not which setores. Already in scope of poll_sampling.

  2. TSE individual-protocol pages: sig.tse.jus.br/ords/dwapr/r/seai/ sig-pesquisas/ may carry complement metadata not in the bulk export. Unknown payoff until a 10-page spot check confirms. ~5.4k page fetches if signal exists.

  3. Relatório PDFs: poll_relatorio.py already extracts these for per-candidate vote intentions; might also carry coverage narrative. Probably moot — pollsters who deferred at registration likely published with similarly vague language.

Update 2026-06-02 — better recovery path found. Spot-check of TSE endpoints revealed that the CKAN package pesquisas-eleitorais-2024 on dadosabertos.tse.jus.br has 6 resources, not the 3 we use. The missing ones include bairro_municipio_2024.zip — explicitly described in the package metadata as "Detalhamento de bairro/município" — which IS the complement that the deferral language pointed to. Plus questionario_pesquisa_2024.zip (PDF questionnaires) and nota_fiscal_2024.zip (invoices). All three at https://cdn.tse.jus.br/estatistica/sead/odsele/pesquisa_eleitoral/.

CDN is HTTP-403 from sandbox (confirms TSE-CDN-blocked memory note), so the mirror has to happen from a host environment. Once mirrored, the deferred-coverage problem becomes a 5,393-PDF extraction task (see todo.md § "Extract bairro/município PDFs for deferred polls"). Likely upgrades the recommended-paper framing in this section: deferral is a Channel A covariate AND we can resolve the actual coverage in most cases. Two informative variables, not one.

The questionnaire PDFs also rescue the question_wording_order lever from "essentially unobservable" (7.7% mention in registration text) to "structurally extractable" — see todo.md § "Extract question_wording_order from questionario_pesquisa PDFs". This was the single weakest field in the smoke test; now it has a path.

Update 2026-06-02 — zip mirrored, recovery confirmed empirically. After bi-dropbox mirror, cross-tab of mayor universe × bairro PDF presence shows:

Bucket n has PDF recovery
deferred_complement 5,489 5,180 94.4%
very_short 2,964 2,839 95.8%
empty 6 6 100%
substantive 6,417 5,173 80.6%
all mayor 14,876 13,198 88.7%

Only 309 deferred polls (5.6%) are truly lost (deferred AND no PDF). The 20% of substantive polls without a PDF probably had complete coverage info in DS_DADO_MUNICIPIO so no complement was filed.

Bairro extractor pilot (20 PDFs, 100% valid post-fix) at projects/poll-sponsor-bias/build/llm/bairro_detail_pilot/. Findings:

Recommended paper framing. Treat coverage_class= deferred_complement as a Channel A covariate, not a missingness problem. Two regressions become interesting:

Both use deferral itself as the design choice, which is what it is. The 36.9% prevalence makes the test well-powered. If sponsors disproportionately choose pollsters who default to the boilerplate, that's already informative — a deferred coverage decision is strategic in expectation, even when no specific bairros are listed.

A subtler interpretation worth keeping in mind: the bimodality suggests deferral is a pollster-level technology choice, not a per-poll strategic one. If true, the within-pollster FE will wipe out the deferral × bias correlation. The interesting variation may be across pollsters — sponsors selecting deferred-coverage pollsters — rather than within-pollster choice to defer. Check by comparing β across pollster FE vs no-pollster FE specifications.

Methodology dedupe gain is smaller than expected (2026-06-02)

The poll-methodology extract pipeline assumed pollsters reuse boilerplate across polls — initial todo entry estimated "a few hundred unique pollster × template combinations" across 14k polls. The 200-poll dry-run shows otherwise:

Reason: pollsters parameterize quota numbers and population references by município. The "fixed template" surface is small; the per-município content is large enough to dominate the hash. Implication: dedupe gain is marginal. Full-universe LLM cost at gpt-4o-mini is ~$25–30, not the $5 the original todo estimated. Still cheap enough that this isn't a blocker.

Miscellaneous notes

Residual decomposition of the +7 pp (2026-06-14)

By this point the structural Channel A search has tested most of the levers visible in the TSE registration text (AN-019, AN-021, AN-022, AN-024, AN-032, AN-041, AN-042, AN-043, AN-051, AN-055, AN-056, AN-057, AN-058). The honest accounting suggests we may be looking in the wrong place for the bulk of the +7 pp — and that scaling the LLM extraction to the universe is a documentation upgrade, not a discovery path. This section lays out the residual decomposition explicitly so the next iteration can target the unexplained part, not re-pound the explained part.

What the structural search has plausibly attributed

Generous upper bounds (i.e. assuming the small positive signals at p < 0.10 are real and operate at their estimated effect sizes):

Lever (AN) Plausible contribution to +7 pp Why bounded
Scenario rotation under-doc (AN-051) 1-2 pp Order/rotation effects in survey-methods literature top out near this magnitude even when actively manipulated
Population reference mismatch (AN-056) 0-1 pp Frame differences (TSE-elig vs IBGE) shift composition by a few percent, not tens
Ponderação description + post-strat (AN-057, AN-058) 0-2 pp Selective-disclosure direction; AN-058's income subset hints at claimed correction being effective on income, capping the surviving magnitude
Subtotal 1-5 pp
Unexplained residual 2-6 pp What scaling LLM extraction will likely not surface

Untested hypotheses (where the residual likely lives)

Six categories of mechanism that the structural extractors cannot see by design — registration text doesn't describe them:

  1. Tabulation-stage manipulation / sophisticated fabrication. AN-013 ruled out crude post-fielding digit-frequency signatures. AN-013 v2 (2026-06-14) then ran the three blind-spot tests: standardised-error variance ("too clean" test) returned null (Levene p = 0.12; sponsored polls have plenty of spread); bias concentration returned the anti-fabrication direction (sponsored 11.5 % within ±2pp vs indep 19.0 %, z = −4.75, p < 0.0001); within-firm tenths-digit TVD elevated (mean 0.39) but most plausibly explained by customer-specific reporting formats / scenario differences rather than tampering.

    • Plausible magnitude (revised down): 0-2 pp. The simple "+7 pp fabricated" story is structurally inconsistent with T1 + T2.
    • Testability: weak; AN-013 v2's portfolio is the best available with registered data.
  2. Firm-level slant-for-hire selection. RULED OUT by AN-059 (2026-06-14). Variance decomposition of headline +7 pp: within-firm = +7.98 pp (102 % of headline); between-firm selection = −0.13 pp (~0). The slant happens WITHIN firm, not via sponsors hiring high-baseline-error firms. AN-016's within-firm β dispersion (sd 10.3) and AN-018's size-discipline pattern still hold across firms; what AN-059 rules out is sponsors concentrating on the high-slant firms in a way that would let firm selection account for any meaningful share of the +7 pp. The 2-4 pp prior magnitude estimate is now 0. The residual reallocation must come from categories #1, #3-#6.

  3. Wave selection within a firm × race. Already addressed by AN-003 (pre-poll trajectory placebo). The placebo restricts the comparator to the most recent prior poll of the same candidate in the same race, with comparator pool = independent media OR the pollster itself. Same-firm pollster-self polls are therefore in the comparator pool. The placebo magnitudes survive tightening to ≤14 days and ≤7 days windows ($+\DescPlaceboShortJump$ pp / $+\DescPlaceboSevenJump$ pp). Wave-selection within firm × race would require sponsors to systematically pick favourable moments within a window short enough that genuine campaign drift is small — but the placebo shows the magnitude is essentially the same at 7 days as it is at 14 or median 10. Wave selection cannot explain a stable +7 pp that holds across these windows.

  4. Sample frame contamination. Recruitment from party-affiliated lists, donor databases, attendees at rallies, geocoded campaign reach. The TSE registration would just say "amostragem por quotas" without revealing the underlying sample frame.

    • Plausible magnitude: 1-4 pp. A 5-10 % over-representation of party-affiliated respondents in the frame would produce this size effect.
    • Testability: weak from registered data alone. Anecdotal evidence + the EJ poll-lawsuit corpus would be the place to look (some lawsuits allege exactly this).
  5. Interviewer scripting / pacing. Subtle cues during the interview that bias responses toward the sponsor. Not in registration text.

    • Plausible magnitude: 0-2 pp. Survey-methods literature puts interviewer effects at 1-3 pp in well-studied contexts.
    • Testability: very weak from registered data; would need interviewer-level identifiers + cross-firm interviewer mobility.
  6. Strategic timing relative to news events. "Data de campo" recurred 5× in the blinded LLM-judge brief. Sponsors may time fieldwork to catch favourable news cycles (post-rally, post-debate, post-attack-on-opponent).

    • Substantially constrained by Spec 3c (race × week FE): β = +7.94 pp on 448 race-week cells where both a sponsored and an independent comparator poll co-exist. A common event bump would lift both sides and cancel; that the coefficient survives at headline magnitude on this tight temporal control argues that event-timing within ±7 days is not the mechanism. The AN-003 placebo (prior-poll only) does not on its own bracket post-event timing, but Spec 3c does — both sides can field on either side of the sponsored poll within the week. (Noted 2026-06-14.)
    • Remaining loophole: race-week cells where sponsors actively avoid co-presence with an indep comparator are not in Spec 3c's identifying sample, so event-timing in those cells is untested. Plausible but requires sponsor coordination across firms.
    • Plausible magnitude (revised down): 0-1 pp.
    • Testability of the loophole: needs event-database join + a "first-poll-in-race" subsample where Spec 3c can't bracket.

Three-way categorisation by testability

Implication for universe-scale LLM extraction

The cancelled poll_sampling + poll_operations batches now have a substantive justification: they unlock AN-056's population-reference test at universe scale (p=0.12 → likely clean p<0.001) and AN-057's ponderação test (p=0.04 → tighter CIs). But neither finds the residual 2-6 pp. The right time to scale is after the firm-selection decomposition (which we can do now with no new extraction) — that decomposition will tell us how much of the +7 pp the LLM extractions are even eligible to explain.

Concrete next tests, in priority order

  1. Firm-level decomposition of +7 pp. DONE — AN-059 (2026-06-14). Between-firm component is essentially zero (−0.13 pp); within-firm is the entire +7.85 pp. Sponsor selection of firms is not the mechanism. The residual must live in categories #1, #3-#6 above.
  2. Wave-selection test. Already covered by AN-003 (pre-poll trajectory placebo). Same-firm pollster-self polls are in the comparator pool, and the magnitude survives tightening the gap to ≤7 days. Wave selection cannot account for a +7 pp that holds at that window.
  3. Stronger fabrication forensics on candidate vote shares (highest-priority remaining test). AN-013 v2 with the test family expanded to Benford on totals, posterior vs Dirichlet under declared sample design, within-firm rounding-pattern consistency, etc. Low power per test but cheap to run; a portfolio could rule in or rule out sample-design-consistent fabrication.
  4. Universe-scale LLM extraction is now even less attractive. With firm selection zeroed out by AN-059, the residual 2-6 pp is structurally inaccessible to extraction-from-registration-text. Scaling poll_weighting tightens CIs on the AN-057 (+0.04 p) signal but doesn't address the residual. Defer indefinitely unless we get a positive signal from (2) or (3) that the extractor would sharpen.