TODOs
Priority overview (2026-06-21 audit)
Triaged inventory of the 115 open - [ ] items below. Read this
before picking work; many AN-NNN lead bullets are low-priority
puzzles or completeness items, while a handful of paper-facing
edits dominate the marginal paper value.
Tier A — high paper value, near-term actionable (~10 items)
Each can land without waiting on external infra. Rank order is roughly by paper impact.
- G1. LLM extraction gold standard (
§ GPT-5-pro). ~3 days hand-coding of 300–500 polls stratified by sponsor × firm size. Closes the extraction-error-as-confound referee question. #27. Redesign Table 2 (design-inventory)— done 2026-06-21.#18. Transpose Table 1— done 2026-06-21.#22 / #24. Scatter figures— done 2026-06-21 (used existing AN-017 + AN-018 figures).#26. Reframe upstream media filter— done 2026-06-21 (new script AN-118).- AN-130 + hand-validate stratified LLM vote-intention sample
(
§ Data-quality validation). Two-step: (i) writeAN-130sampling script — stratified 50 sp + 50 ind, outputs a 100-row CSV for hand coding; (ii) Henrik hand-codes ~3 hr; (iii) thin re-fit script reads corrected numbers back and re-runs Spec 2 + Spec 3c to publish a validated-subset coefficient row in the appendix. Submission-blocking: §3.1 LLM-pipeline robustness footnote + honest answer to "is the +7 pp an extraction artefact?" referee question rest on this. - AN-072: Visibility-weighted accuracy vs bias (
§ Leads from an-071). ~1 hr. Sharpens the AN-071 null on accuracy-vs-bias. Negative-β firm diagnostic (CENSUS, EVA FRANCIELI)— done AN-119 2026-06-21. Their AN-016 β values are identified off 1-2 candidates each; AN-071 strict-cut and AN-073 negative HHI×β are both within-firm-spec-thinness artefacts.- AN-NNN: Decompose AN-006's CPF +19 by repeat-vs-singleton
subset (
§ Leads from an-074). ~30 min, reuses heterogeneity.py. M1-individual vs M4 mechanism discriminator. - Paper §sec:caveats — empirical-noise-floor caveat paragraph
(
§ Leads from an-110). Pen-and-paper, ~30 min. Pre-empts referee question on DEFF*=12.59 vs binomial-loud framing. AN-121: Universe-level sponsor-type breakdown with iceberg quantification.— done AN-121 2026-06-21. Headline: 12.4% of 2024 mayoral polls routed through cover vehicles (shell + MEI) vs 3.8% in 2020 — a 3.3× increase; uncoded residual (low-volume sponsors) shrank 6.4% → 2.8%. 89 shell CNPJs / 1,149 polls; IPOP pattern at scale = 15 pollsters / 508 polls switched self → shell between cycles. Paper-load-bearing for intro ¶3 and §2; appendix table atbuild/table/iceberg_universe.tex. See follow-ups below.
Tier A.5 — R&R-deferred, referee-anticipated (emerged 2026-06-21)
Surfaced by the substantive referee-review pass (external agent, 25 findings). Triaged as defensible to defer until R&R; not submission-blocking. Listed in priority order within tier.
- AN-125: safe-race rule-outs — formal robustness against the "safe-race leaders just like polling for donor signaling" alternative. Beyond AN-006/007 rank-at-commissioning, run party-FE × race-margin interactions and tabulate β within safe-race subsamples for a published-back appendix row.
- AN-126: rank-at-commissioning robustness grid — vary the rank classifier (final-share quartile vs latest-prior-poll vs campaign-finance quartile) and report β within each. Closes the "rank coding chosen for a particular β" referee variant.
- AN-127: sponsor-agnostic floor ROC — pair AN-106 DML detection (AUC 0.72) with a per-poll outlier-score ROC on the three floor signals (consensus deviation, within-firm same-cand consistency, digit-forensic markers) and report the trade-off curve. Supports the §7.2 third-best mechanism quantitatively.
- AN-128: within-week event coincidence — placebo refinement on the within-candidate trajectory test: re-fit excluding candidate-week cells where another non-poll campaign event (registered ad spend spike, news event) coincides.
- AN-131: matched-share threshold table — sensitivity grid over the matched_share=1 restriction (0.95, 0.90, 0.80, 0.50, all) reporting β at each threshold. Pre-empts the "restriction hand-picked for β" question. (Slug bumped from AN-129 on 2026-06-24; AN-129 is now the per-poll-bias-on-firm-chars regression in §7.2 / Table 4.)
Tier B — medium value, mostly extensions (~30 items)
Useful refinements or new analyses; reuse existing scaffolds. Many are 1-4 hr. Examples:
- #5/#14/#15/#16 punch-list items — remaining annotation refinements (drop-or-keep judgment calls).
Drop— done 2026-06-26 (self-review #15). Table 1 now 3 cols (naive, cand FE, race × week FE). Within-candidate SE / WCR / multiway / drop-largest / match-score / permutation rewired ontoX_pfrom main specspec0_cand(cand FE only). Headline coefficient +6.9 pp unchanged at 3 decimals.Robustness §8 prose → tables/figures— done 2026-06-26 (self-review pass 9 #21). Two new tables (tab:robust-sensitivities,tab:robust-inference) consolidate drop-largest + match-score + inference sensitivities;fig:jackknifeshows leave-one-pollster- leave-one-UF; party-heterogeneity stays as one paragraph. AN-011
refit on
spec0_candfor consistency with the new headline.
- leave-one-UF; party-heterogeneity stays as one paragraph. AN-011
refit on
Re-source— already done inlog donationsfrom receita with SQ_CANDIDATOan-127-selection-by-candidate.py(receita_2024.csv aggregated by SQ_CANDIDATO, bridged via candidato.csv). Paper note updated 2026-06-26 to state the source.- Pollster × sponsor interaction Wald test — reuses AN-009 scaffold.
- TWFE wild-cluster bootstrap on proper spec 3c — SE coverage completeness; ~50s compute.
Sponsor-type heterogeneity by— done AN-120 2026-06-21. Original hypothesis refuted (Fundo Partidário n=239 β≈baseline); new finding: #NULO# undeclared cells β=+10-14, parallels selective-disclosure axis.DS_ORIGEM_RECURSO- Two-axis specialization map (party-HHI × state-HHI) — AN-073 visual companion.
- Selective-disclosure systematic check — McNemar over all binary
_describedfields; AN-042 extension. - AN-026/027/029/030 selection refinements (rank-at-commission + money, registration-date alignment).
- AN-074/075/078/079 cost + dyad mechanism refinements.
- AN-108/109/110/111 noise-floor follow-ups (decomposition, slice- conditional DEFF, sponsored-protocol catch profile).
Tier C — low priority (puzzles, low-leverage, ~30 items)
Don't pick up unless directly motivated. Includes:
- Single-firm/single-candidate descriptives ("CENSUS muni 12017" case description; Piauí LOO investigation).
- Thin-cell triage on small-n rank-3+ subsets.
- Lit downloads / OA PDF syncs.
- Hypothesis-page write-ups deferred until companion-paper draft.
op_x_defselection puzzle.
Tier D — blocked (~12 items)
Depend on external infrastructure. Don't queue.
- Main forward task — full-universe LLM methodology batch (24 h SLA, submitted 2026-06-02). Unblocks G4 + 4 deferral / population / coverage AN regressions. Status check first before queueing more work here.
- Mirror missing TSE PesqEle resources — host-shell only (CDN 403 from sandbox).
- Bairro/município PDFs at scale — depends on the mirror above.
- Tier 2 IBGE-setor profile — needs geospatial stack outside sandbox + IBGE data ingestion; only worth starting if cheap Tier 2 fires a signal worth sharpening.
- Per-TRE PJe audit-document scrape — host-only TJMG-style scrape; feasibility-probe-first.
- 2022 cycle extension — lower priority than Main forward task.
Tier E — writing / framing items (~10 items)
Paper-prose work, no analysis needed.
- Industry-insider framing — Felipe Nunes / Quaest — intro footnote; ~30 min.
- Argument: reputation, not law, disciplines main pollsters — ~1 paragraph + institutions.md update; ~1-2 hr.
- §sec:policy update — M5 / publication option — paragraph.
- §sec:policy — calibrated blind audit policy proposal — AN-109 v2-derived paragraph.
- Add "selective disclosure" subsection to source-of-bias.md — ~10 lines.
- Hypothesis-page rewrites (coordination-peak, supply-side bandwagon, viability-grab) — defer until companion-paper draft is settled.
Self-review punch list from local-site annotations — 2026-06-21
All 31 items from the 2026-06-21 hypothes.is self-review pass on
build/site/paper/index.html have been resolved. Full annotation
log at docs/responses_hsigstad_2026-06-21.md; per-item
disposition migrated to docs/done.md under the "Self-review
punch list (2026-06-21 annotations)" subsection.
GPT-5-pro review actionables — top-poli-sci submission (2026-06-14)
GPT-5-pro pre-submission review at docs/reviews/gpt5-2026-06-14-paper-note-review.md
flagged 17 actionable items. Completed entries (G3, G5–G9, G11–G17) moved to
docs/done.md. Remaining items below.
Skipped (data-constrained): 2020-cycle replication of Spec 2 for external validity. We don't have the 2020 mayoral-poll register cleaned to the same standard as 2024.
G1. LLM extraction gold standard. GPT: "300–500 polls, stratified by sponsor type and firm size. Report extraction precision/recall and re-estimate headline and 3c on the gold set. Show no correlation between extraction error and sponsor or firm size."
- Have: §within-firm dispersion argument (46-pp cross-firm spread + within-firm constant extraction template) is GPT's "no uniform extraction artifact" defense. Doesn't substitute for a gold standard.
- Next: Sample 400 polls stratified by (sponsor type × firm-size tertile × randomized cell). Hand-code vote-intention numbers for the named candidate(s). Diff vs LLM output. Report precision / recall / MAE. Re-estimate Spec 2 and Spec 3c on the verified subset. Regression of |extraction error| on sponsor indicator + firm-size tertile + cell FE.
- Cost: ~3 days of careful hand-coding (assuming 1 min/poll). Or contract out.
[partial] G2. Publication-selection check by media coverage. (2026-06-14)
- Data limit: Brazilian poll registration does not record publication status, and
the AN-025 Google News scrape keys on
firm-name × race-name, not protocol. A clean per-poll publication test is not possible without a per-protocol scrape (separate follow-up). - Proxy run (AN-065): Sponsored polls in firm-race cells with no media coverage show MORE bias (+15.2 pp, n=10) than in covered cells (+7.2 pp, n=28) — opposite the publication-selection prediction. n=10 thin; the proxy doesn't support publication selection driving the headline. §sec:caveats new "Publication selection not directly testable" paragraph reports the proxy + caveats; the intro's "plausibly includes" hedge stays.
- Follow-up open: Per-protocol Google News scrape with poll-date queries would enable the clean test. Cost: re-scrape + reanalyze (~3 days).
- Data limit: Brazilian poll registration does not record publication status, and
the AN-025 Google News scrape keys on
G4. Mechanism — at least one universe-scale Channel A regression. GPT: "Add at least one design-grade regression at universe scale where feasible (e.g., population-frame mismatch; coverage class; mode) with sponsor×design interactions in a within-candidate FE. Even nulls with tight CIs help bound Channel A."
- Have: Pair-level analyses (AN-019, 020, 021, 022) on n=200 LLM subset. Universe LLM extraction running (per todo top section).
- Next: Once universe LLM batch lands (Main forward task above), run within-candidate
FE Spec 2 with
sponsored_by×coverage_class(and one of mode, population-frame, quota structure) interactions. Report and bound. - Cost: Blocked on Main forward task. Light analysis when unblocked.
[partial] G10. Scenario-choice robustness — espontânea placebo deferred. (2026-06-14)
- Done (G10 main): AN-067 structural bound — 93.2% of mayoral estimulado protocols
have a single scenario, scenario-pick is degenerate for them. On the 6.8%
multi-scenario subset, median within-candidate vote-percent spread is 3.15 pp;
aggregate scenario-pick effect on β bounded at <0.2 pp. (See
done.mdG10.) - Open: (c) espontânea-only as a placebo — requires rebuilding cand_poll from poll_response_2024.parquet with the espontânea filter (~1 day pipeline work). Genuine placebo: open-ended recall doesn't have the sponsor's candidate on a card the respondent sees. Deferred.
- Done (G10 main): AN-067 structural bound — 93.2% of mayoral estimulado protocols
have a single scenario, scenario-pick is degenerate for them. On the 6.8%
multi-scenario subset, median within-candidate vote-percent spread is 3.15 pp;
aggregate scenario-pick effect on β bounded at <0.2 pp. (See
Main forward task (held pending pilot iteration)
- [~] Full-universe methodology batch — HOLD RELEASED, BUILD + SUBMIT IN FLIGHT (2026-06-02)
- Status (2026-06-02 ~15:30): Build complete (34,935 LLM requests across sampling 14,338 + coverage 6,251 + operations 14,346, after coverage short-circuit eats 8,073 polls). Submit running in background; batch_state.json being written. 24h SLA. Expected cost ~$15-20 at gpt-4o-mini Batch rates.
- Hold-release trigger: AN-041/042/043/051/052/053/054 quick-win battery surfaced no prompt-iteration leads on the methodology schemas (sampling / coverage / operations). The questionnaire layer (AN-051+) used a separate schema and the rotation finding was absorbed into firm-tier discipline via AN-054, not into a methodology-prompt change. Gate lifted.
- Once
cmd_statusshows all batchescompleted: runcmd_harvestto populatepipelines/politica/build/llm/poll_{sampling,coverage,operations}/caches. Then re-runextract_methodology.py --allto assemble the universe-wide methodology parquet. Then run the queued universe-scale ANs (population frame / coverage class / setor usage) on that. - created: 2026-06-02
Mechanism decomposition (Channel A vs B)
Extract structured methodology features from registration free-text (
DS_METODOLOGIA_PESQUISA,DS_PLANO_AMOSTRAL,DS_SISTEMA_CONTROLE,DS_DADO_MUNICIPIO)- STATUS 2026-06-02: three-task split implemented at
pipelines/politica/source/llm/poll_{sampling,coverage,operations}.py(schemas in same dir, prompts underprompts/); bulk extractor atextract_methodology.py. Smoke tested on 4 polls (all valid). Coverage short-circuits 56.8% of polls (36.9% deferred + 19.9% short/empty) without LLM. Full run pending — see "Run extract_methodology.py at scale" below. - context: TSE registration requires every poll to file a full sampling
plan + control system narrative before publication (median 150–1,640
chars per field). These free-text blocks are the key input to the
mechanism decomposition (see
summary.md§ "Mechanism: design-driven slant vs residual / fraud") — they let us split the +6.7 pp sponsor effect into Channel A (Bayesian-persuasion-via-design) and Channel B (residual / fabrication). - the natural llmkit task lives at
pipelines/politica/source/llm/poll_methodology.pywith the SO response_format. Schema fields to extract per poll:coverage_class(FLAGGED — most consequential single field; primarily fromDS_DADO_MUNICIPIO): full-municipality / urban-only / urban-plus-selected-rural / specific-neighborhoods / other.mode: in-person / phone / online / mixedsample_design_class: simple-random / stratified / quota / multi-stage / PPT / mixedis_quota_sample: booln_stages: 1 / 2 / 3+quota_variables: list of {sex, age, education, income, religion, occupation, region, race}population_reference: census_2022_residents / TSE_eligible / turnout_weighted / other / not_specifiedcensus_sectors_used: boolpopulation_source: TSE / IBGE / both / othercollection_device: tablet / paper / mixedaudit_mechanism: bool + brief descriptionnonresponse_handling: notesquestion_order_described: boolnotes: free-form
- cost: revised upward to ~$25-30 at gpt-4o-mini for the full 14k
after 200-poll dry-run showed dedupe gain is only 0-5% per task
(pollsters parameterize quota numbers and population references
by município — the per-município content dominates the hash, so
the "few hundred unique templates" claim was wrong; see
thinking.md§ "Methodology dedupe gain is smaller than expected"). Still cheap enough that this isn't a blocker. - blocks: Spec 3 of the regressions (the channel A vs B decomposition) — currently we can only run Specs 1/2 with structured methodology controls.
- created: 2026-06-01
- STATUS 2026-06-02: three-task split implemented at
Run
extract_methodology_batch.pyat full 14k scale (Batch API)- replaces the sync
extract_methodology.pyfor full-universe runs. Sync version stays as the dev/smoke path. Both share the schemas, prompts, wrappers, and llmkit cache; batch writes cache files in the same format so the sync assemble step works unchanged. - command sequence (sandbox, no host needed):
PYTHONPATH=/workspace/packages/llmkit:/workspace/pipelines/politica/source/llm \ BASE_DIR=/workspace/pipelines/politica \ python3 pipelines/politica/source/llm/extract_methodology_batch.py build --all # confirm sizes look right (auto-chunks at 180 MB); then submit python3 pipelines/politica/source/llm/extract_methodology_batch.py submit # wait up to 24h; check progress: python3 pipelines/politica/source/llm/extract_methodology_batch.py status # when status shows all completed: python3 pipelines/politica/source/llm/extract_methodology_batch.py harvest # finally assemble the wide parquet from the populated cache: python3 pipelines/politica/source/llm/extract_methodology.py --all - estimated cost: ~$15 at gpt-4o-mini Batch (50% discount on sync pricing); SLA 24h.
- expected chunks at the 180 MB default: 2 for sampling, 1 each for coverage & operations → 4 batches total.
- output:
pipelines/politica/build/llm/poll_methodology_2024.parquet(wide; 72 extraction columns + protocol metadata; verified on the 200-poll subset 2026-06-02, 100% schema validity). - created: 2026-06-02, batch path implemented 2026-06-02
- replaces the sync
Mirror missing TSE PesqEle resources to bi-dropbox (host only)
- cdn.tse.jus.br is HTTP-403 from sandbox; download from host and
upload via rclone. Three resources to grab:
cd /tmp && for f in bairro_municipio_2024.zip \ questionario_pesquisa_2024.zip \ nota_fiscal_2024.zip; do wget -c "https://cdn.tse.jus.br/estatistica/sead/odsele/pesquisa_eleitoral/$f" done rclone copy . bi-dropbox:data/TSE/2024/pesquisa_eleitoral/registration/ \ --include "{bairro_municipio,questionario_pesquisa,nota_fiscal}_2024.zip" - sizes unknown (CKAN reports "None bytes"); expect bairro_municipio to be the largest (per-poll PDFs × ~14k polls).
- after mirror: sandbox sees the files at
bi-dropbox:data/TSE/2024/pesquisa_eleitoral/registration/. - created: 2026-06-02
- cdn.tse.jus.br is HTTP-403 from sandbox; download from host and
upload via rclone. Three resources to grab:
Extract bairro/município PDFs at scale (via Batch API)
- STATUS 2026-06-02: extractor implemented at
pipelines/politica/source/llm/poll_bairro_detail.py(schema inschemas.py:PollBairroDetail, prompts underprompts/). 20-PDF pilot ran at 100% valid post a max_tokens bump (16k) + 50-bairro hard cap. Pilot output:projects/poll-sponsor-bias/build/llm/bairro_detail_pilot/. - Empirical recovery rate (cross-tab against the bairro
zip's 13,208 protocols, mayor universe):
- deferred → 94.4% have PDF (5,180 of 5,489)
- very_short → 95.8% (2,839 of 2,964)
- substantive → 80.6% (5,173 of 6,417) — the 20% without PDF probably already had complete info in DS_DADO_MUNICIPIO.
- Total recoverable: 13,198 / 14,876 = 88.7% of mayor polls.
- Scale options:
- Deferred-only run: 5,180 PDFs, ~$5-8 at Batch. Priority case.
- All mayor polls with PDF: 13,198, ~$15-25 at Batch. Gives cross-validation on substantive polls (does the PDF's coverage_class_resolved agree with our DS_DADO_MUNICIPIO classification?).
- Two interesting findings flagged for follow-up (see
thinking.md):- 13/15 polls with comparable QT match exactly between registry and PDF total_interviews_distributed. 2 differ. Could indicate partial / urban-only field execution despite full-coverage declared. Potential Channel B signature.
- "PESQUISA NÃO REALIZADA" is its own bucket (1 in our 20-PDF pilot). Cancelled polls still generated a stamp PDF. Cross-tab with the methodology extraction will show whether these cluster in any pollster.
- Batch path: extend
extract_methodology_batch.pywith a 4th taskbairro_detail(or create a parallel batch script). - downstream:
coverage_class_resolvedreplacescoverage_class=deferred_complementwhere the PDF resolves it; keepcoverage_to_be_complementedas a separate covariate (still a Channel A signal — pollster declined to commit at registration). - created: 2026-06-02, pilot done 2026-06-02
- STATUS 2026-06-02: extractor implemented at
Extract question_wording_order from questionario_pesquisa PDFs
- currently the
scenarios_described,question_order_described,name_rotationfields of poll_operations are near-zero (only 7.7% of polls mention "estimulad" in the methodology text; 0% mention most other scenario terms). This is because scenario / order info lives in the questionnaire PDF, not the registration narrative. - new llmkit task
poll_questionnaireafter the mirror lands. Schema mirrors the question-block of poll_operations but with actual content:scenarios_listed,scenario_order,name_order_pattern, has_name_rotation, approval_before_vote_intention: bool, nonresponse_options. - cost: ~$15-25 at gpt-4o-mini depending on PDF length distribution.
- rescues the question_wording_order lever from "unobservable" to "well-measured" — relevant to design_levers.md § lever #5.
- created: 2026-06-02
- currently the
Cluster-grain extraction from DS_PLANO_AMOSTRAL for deferred polls
- 44% of deferred polls (2,372) mention geographic vocabulary
(setores censitários / bairros / zona urbana) in the sampling
plan. Even without the specific list, we recover the grain of
the cluster sample. Already in scope of
poll_sampling.cluster_unit— just need to verify the LLM extraction picks it up correctly on deferred polls (where DS_DADO_MUNICIPIO doesn't hint at grain). - non-blocking; spot-check after the full 14k run.
- created: 2026-06-02
- 44% of deferred polls (2,372) mention geographic vocabulary
(setores censitários / bairros / zona urbana) in the sampling
plan. Even without the specific list, we recover the grain of
the cluster sample. Already in scope of
Two deferral-as-treatment regressions (after extract is done)
- Spec A: does sponsorship predict coverage deferral?
Pr(deferred_complement | sponsor, candidate FE, race × week FE). Tests whether self-sponsored polls disproportionately defer. - Spec B: does deferral predict bias magnitude?
error ~ β·sponsored + γ·sponsored × deferred + candidate FE + race × week FE. γ ≠ 0 would show the sponsor effect concentrates in deferred-coverage polls — strong Channel A evidence. - Caveat in
thinking.md: pollster bimodality means within-pollster FE could wipe γ out (deferral is a pollster-level technology choice, not a per-poll strategic one). Run with and without pollster FE; the gap is informative. - created: 2026-06-02
- Spec A: does sponsorship predict coverage deferral?
Bairro/setor oversampling test — Channel A "Bayesian cluster selection" (machinery validated 2026-06-02; substantive test blocked on bairro batch extraction at scale)
- STATUS 2026-06-02:
source/analysis/_secao_oversampling_scope.pyvalidates the test machinery on the 20-poll bairro pilot. Key findings:- Sponsored coverage in pilot is too thin — only 1 of 20 polls is candidate-sponsored (MG067202024, RILDO COSTA / SOLIDARIEDADE in Prados-MG). Pilot was methodology-driven, not sponsorship-stratified; the substantive test cannot run on it.
- Bairro-name matching to TSE local_votacao works — mean match rate 32 % across 15 pilot polls in analysis_table; 3/15 at ≥50 % (Maceió 89 %, Fortaleza 76 %, Belo Horizonte 66 %). Small munis match poorly because LV has 1-2 coarse bairros per muni while polls list 10-30 granular bairros. Fuzzy matching + geocoding (TODO item: "Optional IBGE setor↔seção crosswalk") would improve this materially.
- Within-muni seção party-share variance is substantial — sd ≈ 0.075 across seções for each muni's dominant 2020 party (11 of 15 pilot munis have sd > 0.05). Test has real leverage.
- Placebo passes cleanly — on the 3 well-matched independent
polls (BH, Maceió, Fortaleza),
oversample_indexfor the muni's top-3 2020 parties is ≈ 0 across the board (max |index| = 0.003). The machinery is methodology-validated; ready to flip to sponsored polls.
- Blocked on: bairro PDF batch extraction at scale (the "Extract bairro/município PDFs at scale (via Batch API)" TODO). With 5,180 deferred polls × ~5 % sponsored = ~250 sponsored polls to run through this test → enough power.
- context: the bairro PDF extraction gives us the EXACT list of
bairros (sometimes setores censitários) sampled per poll. A
direct test of whether pollsters tilt the cluster sample toward
favorable geographies is to compute:
weighted_party_share= sum over poll's bairros of (party's prior-election vote share in bairro × bairro's share of poll's interviews)muni_party_share= party's prior-election vote share muni-wideoversample_index= weighted_party_share − muni_party_share For sponsored polls the test isoversample_index >> 0(sponsor's party's friendly bairros over-represented vs the muni-wide baseline). For indep polls of the same muni × week, the index should be ≈0 if the comparison is clean.
- data needed (host can fetch from cdn.tse.jus.br; sandbox CDN-403):
- TSE seção-level voting:
votacao_secao_2020.zip(prior cycle for 2024-prefeito context) andvotacao_secao_2022.zip(most recent deputado/governador → party-level baselines). Both at https://cdn.tse.jus.br/estatistica/sead/odsele/ votacao_secao/ - Seção ↔ address mapping:
eleitorado_local_votacao_2024.zip(gives each seção a logradouro + bairro for matching). - Optional IBGE setor ↔ seção crosswalk: not officially published; geocoding addresses against IBGE setor polygons reconstructs it. For QUAEST-style PDFs where setor codes ARE listed, this gives a direct join.
- TSE seção-level voting:
- matching strategy:
- First-best: setor IBGE code (when listed in PDF) → seção via geocoding crosswalk → seção votes.
- Second-best: bairro name → match TSE local-votação bairros (string-match within muni) → aggregate seções → votes.
- Worst-case: bairro→muni votes (no geographic differentiation — falls back to no-test).
- downstream tests:
- Within-pair (curated sample): does sponsored poll's
oversample_indexexceed indep poll's in the same muni × week? - Universe regression: `oversample_index ~ sponsor_indicator
- race × week FE + pollster FE` — tests if oversampling is a sponsor-driven choice or a pollster-level habit.
- Interaction with bias:
error ~ sponsored × oversample_ index— does oversampling DRIVE the bias?
- Within-pair (curated sample): does sponsored poll's
- timing: defer until bairro extraction has run on the relevant poll set (curated pairs at minimum, full universe ideal).
- created: 2026-06-02
- STATUS 2026-06-02:
Tier 2 — full IBGE-setor candidate supporter profile (sharpening on cheap Tier 2)
- context: cheap Tier 2 (separate todo, candidate-base from 2020
seção votes only) gives a coarse urban/rural proxy via
base_lv_size_weighted(vote-weighted avg of seções per local_votacao). Full Tier 2 adds the IBGE Censo setor-level socioeconomic layer — income decile / education / age band / formally-coded urban-rural — via spatial join oflocal_votacaolat/lon ∈ setor polygon. Lets us write candidate-base profiles like "55 % of base in lowest two income deciles; 70 % in rural setores", not just "low LV-size weighted concentration". - trigger condition (don't start before this fires): cheap Tier 2's
triple-interaction test (
error ~ sponsored × coverage_class × candidate_base_skew) finds a significant signal whose interpretation would meaningfully sharpen with the socioeconomic layer — i.e. the cheap proxy is the binding limitation, not the coverage_class extraction itself. If cheap Tier 2 finds nothing, Tier 2 sharpening adds no information; skip. - realistic cost (audited 2026-06-14): the analytical work is
1.5-2 days. The friction is infrastructure, summarised:
| step | days | binding cost |
|---|---:|---|
| geospatial Python stack | 0.5-2 |
geopandas/shapely/geobrnot installed in sandbox;/opt/venvis read-only (verified); needs user-local venv + GDAL/PROJ system libs (may need IT) or laptop detour. AN-032's setor addendum hit exactly this wall ("blocked on geobr / shapefiles not in sandbox"). | | IBGE data acquisition | 1 |pipelines/brazil/build/is empty — no setor data cleaned anywhere in the workspace. Need Censo 2010 setor agregados (~200 MB compressed for income/education vars) + setor polygon shapefiles (~1-3 GB). Sandbox internet access is uneven for new domains. | | seção → setor crosswalk | 1 | No public bijection; standard route is spatial join vialocal_votacaolat/lon ∈ setor polygon. Voters don't necessarily live in the host setor — defensible weighting + muni-total validation required. | | candidate-level aggregation | 0.5 | Easy once 1-3 are in place. Same shape as cheap Tier 2's Step A. | | pipeline-grade integration | 2-3 | Optional. If we wantpipelines/brazil/to own this so saude / deterrence / etc can reuse — adds docs / codebook / validation per doc-contract. | - elapsed-time scenarios:
- Sandbox-only quick-and-dirty: 3-5 days IF step 1 resolves in sandbox. Often it doesn't (AN-032's experience).
- Sandbox blocked → laptop detour: add a few days of environment-switching overhead and broken workspace continuity (Overleaf, git etc).
- Pipeline-grade: 1.5-2 weeks elapsed.
- infrastructure amortization: if any other project (saude,
deterrence, electoral-justice) ends up needing a setor-level
socioeconomic layer, the step-1+2+3 cost amortizes across them.
Worth raising in
meta/priorities.mdbefore spending the week if that's not on the radar. - what this sharpens vs the cheap version: the cheap urbanicity proxy can't distinguish "rural" from "low-income peri-urban". Full Tier 2 separates these cleanly via income deciles. Matters if the structural Channel A story turns out to be about income composition rather than geographic concentration.
- created: 2026-06-14
- context: cheap Tier 2 (separate todo, candidate-base from 2020
seção votes only) gives a coarse urban/rural proxy via
ML / variable-importance analysis of what separates biased from unbiased polls
- context: with the within-candidate FE design we can label each poll with its residualized error (post race × week FE). The question for the paper's mechanism story is which observable features predict that residual. State-of-the-art for interpretable feature ranking in this kind of econometric setting is the open lever — needs a literature pass before picking a method.
- sub-tasks:
- Literature scan of state-of-the-art. Candidates to compare: (i) double/debiased ML with variable importance (Chernozhukov et al. 2018 + follow-ups), (ii) generic ML (Chernozhukov-Demirer-Duflo-Fernandez-Val 2018, 2023) for best-linear-projection of CATE on covariates, (iii) causal forests + variable importance (Athey-Wager), (iv) SHAP / permutation importance over a gradient-boosted residual model (descriptive, not causal), (v) post-LASSO / BCCH 2012 on the interaction set. Decide what "state of the art" means for the question of describing what distinguishes biased from unbiased units, which is closer to heterogeneous-effects characterization than to prediction.
- Feature set. Pull together every observable: methodology extractor outputs (sampling / coverage / operations schemas once the universe-scale batch lands), pollster identity + tier, sponsor type, race characteristics (UF, muni size, competitiveness, incumbency), customer mix, registration timing, fieldwork length, sample size, mode, SE-quality flags. Use only features observable before the result would be known.
- Run the chosen method and produce a ranked description of what differs between biased and unbiased polls. The deliverable is a paper-facing figure/table — not a black-box model.
- Cross-check against the Channel A vs B decomposition. The ML description should be consistent with whichever channel the structural decomposition assigns most of the +6.7 pp to. If ML picks up features the structural decomp missed (or vice versa), that is itself a result.
- relation to existing work: complements the methodology-extractor Channel A vs B split above. That decomposition is theory-led (design choices the model predicts should matter). This task is data-led (let the algorithm tell us which observables actually separate the two groups). The two should triangulate.
- created: 2026-06-12
Source-of-bias probe agenda — sharpen the "what is the mechanism" answer
Tracks the open agenda from
docs/source-of-bias.md. The current evidence
splits into (a) concrete design-choice differences too small on their
own to explain +7 pp, and (b) opacity differences that are
statistically loud but not themselves a mechanism. Each task below
either tightens a concrete design lever or rules one out, narrowing
the space toward (or formally settling on) the "opacity is genuinely
the answer" reading.
Relatório-PDF methodology extractor (new — escalation from AN-043)
- context: AN-043 confirmed the registration-PDF methodology
extraction cannot reach nonresponse handling (post-fielding
analytical choice). The relatório (publication report) PDF is
the natural home for that field plus weighting / post-
stratification rules. The PDFs are on disk:
pipelines/politica/build/scrape/tse_relatorio/2024/contains 1,608 relatório PDFs (intersect with the 9,509-protocol universe is partial — not every registration has a published relatório). What's missing is the methodology extractor over those PDFs — the existingpipelines/politica/source/llm/poll_relatorio.pypulls candidate-percent rows only. - approach: write a new
poll_relatorio_methodology.py(or extendpoll_relatorio.pywith a methodology schema) covering: nonresponse handling rule (redistribute-to-leaders / proportional / exclude / cristalizar), response rate, refusal rate, weighting variables actually applied, post-stratification description, question wording for the headline scenario. Pilot on the curated_pairs subset intersected with the 1,608-PDF relatório universe; then run on full panel. - cost: ~$5-30 for the pilot pass; ~2 h of code. Cheaper than questionnaire extraction because PDFs are smaller.
- blocks/unblocks: closes source-of-bias.md probe item 1 (nonresponse handling) and probe item 4 (weighting / post- stratification structured extraction).
- created: 2026-06-02
- context: AN-043 confirmed the registration-PDF methodology
extraction cannot reach nonresponse handling (post-fielding
analytical choice). The relatório (publication report) PDF is
the natural home for that field plus weighting / post-
stratification rules. The PDFs are on disk:
Weighting / post-stratification structured extraction (new LLM task)
- context:
DS_PLANO_AMOSTRALdescribes the sampling structure but the weighting / post-stratification scheme (how the raw sample is re-balanced against the target population) isn't a structured field in the current poll_sampling schema. Different weighting schemes — proportional to muni demographics, rake to TSE-eligible, cell-weighted vs uniform — can move the realized poll-percent by several pp. - approach: design a new schema field set
weighting: {method ∈ {none, rake_demographic, post_strat_cells, IPW, other}, target_population, variables_used, text_evidence}inpipelines/politica/source/llm/schemas.py, add prompts, run a 200-poll pilot, then full-universe. - cost: ~$3-5 at gpt-4o-mini for 14k; ~half day of schema design
- prompt work.
- blocks/unblocks: directly tests whether sponsored polls apply less aggressive weighting (so non-representative samples don't get corrected back to population norms). If positive, this is a primary mechanism.
- created: 2026-06-02
- context:
[partial] Sampling-error envelope on biased polls — is +7 pp detectable in a single poll, and is n kept small to hide it?
- Sub-question 1 (detectability) — done AN-108, 2026-06-19. Median binomial SE on sponsored polls = 2.49 pp (P25–P75: 2.11–2.77); 99.3 % of sponsored polls have SE < 3.5 pp. A +7 pp shift is z > 2 on a single sponsored poll, and still z=2.0 under DEFF=2 inflation. The bias does not hide in sampling noise per poll — Channel-B-leaning input. A coherent Channel A story now requires per-poll design tilts averaging ≈ 7 pp, not a many- small-tilts story with noise-deniability for each one.
- Sub-question 2 (strategic small-n) — OPEN, partially visible.
AN-108 surfaced the n gap as a side observation: sponsored polls
have median n=360 vs 600 on the within-candidate independent
comparator. Lower-n doesn't rescue noise-deniability (SE still
small relative to 7 pp at n=360), but the gap itself is the
sub-question 2 finding. Open work: within-pair (race × week ×
candidate) n contrast + Wilcoxon, plus a bunching check just
below thresholds (300 / 400 / 600). Suggested script:
an-109-sponsored-n-within-pair.py. - blocks/unblocks: detectability call constrains Channel A interpretation (size-mismatch §5 paper paragraph). Sub-question 2 closes when the within-pair n contrast runs.
- created: 2026-06-19; sub-1 closed 2026-06-19; sub-2 open
Sponsor identification
- LLM-refine the sponsor-type classifier on the residual "other" pool
- context: regex + routes A+B+C+D get 84% of contratante rows into a
known bucket. The residual 2,425 mayoral protocols (16.3%) with only
other/unknown sponsors likely contain more political ones — MEI-format
CNPJs (
CNPJ {CPF} {NAME}pattern), sectoral business associations, political consultancies, "FONTE 83"-type news fronts, political blogs. - the natural llmkit task lives at
pipelines/politica/source/llm/sponsor_classify.py. Schema:sponsor_type ∈ {candidate, candidate_committee, party, coalition, committee, sectoral_assoc, political_consultancy, media_political, media_neutral, pollster_self, other_business, government, other}+ confidence. Run only on contratante names in the residual pool. ~3k LLM calls ≈ $0.50. - non-blocking; refines the treated-set count before final estimation but the headline +6.7 pp is already a usable lower bound.
- created: 2026-06-01
- context: regex + routes A+B+C+D get 84% of contratante rows into a
known bucket. The residual 2,425 mayoral protocols (16.3%) with only
other/unknown sponsors likely contain more political ones — MEI-format
CNPJs (
Complementary data
Layer 2 — Per-TRE PJe audit-document scrape (queued 2026-06-16)
- context: AN-081 (
docs/analyses/an-081-audit-outcome-distribution.md) measured the 1,175 LE.34 §1 audit-request universe from DataJud + TREdiarios decision text. Granted audits → 51.6 % fraud-follow-on within 90 days vs 25 % for denied audits — suggests audits, when conducted, find something actionable. But the actual audit findings (planilhas individuais, audit-report PDFs, §3 retificação records) live in PJe document attachments, not in the diário. - the test this enables. AN-080 left a mechanism question: (1) operational deviation from declared plano amostral, or (2) Channel B fabrication. The conditional distribution of audit findings — fielding deviated from declared design vs planilhas back the published numbers — directly discriminates between them. AN-081 is the indirect handle via subsequent litigation; Layer 2 is the direct handle via audit content.
- the constraint. TSE Consulta Unificada PJE
(
consultaunificadapje.tse.jus.br) is F5-CAPTCHA-gated — documented dead end inprojects/electoral-justice/docs/notes/poll_data_expansion.md. Per-TRE PJe portals (27 separate systems) are the alternative; the TJMG PJe consulta-scrape-v3 pattern (memory:project_pje_consulta_scrape.md) is proven on TJs at 1st-instance. Whether each TRE 1st-instance PJe shares that pattern is unverified. - first step (cheap): 10-NPU probe on TRE-RN, TRE-PR, TRE-SP (top 3 audit-case UFs) using the TJMG scrape pattern. Report whether case-public lookups succeed and what document attachments are accessible. Decide go/no-go for full ~1,175 case scrape based on probe.
- dependency: none. Build is independent of the AN-081/headline path.
- non-blocking; supplements AN-081 with audit-content evidence.
- created: 2026-06-16
- context: AN-081 (
EJ poll-lawsuits as complementary data
context:
projects/electoral-justice/source/clean/proc_2024.py(lines 53–58) already classifies injunction cases byassuntoskeywordPESQUISA ELEITORAL. 2020 dataset: 2,376 poll-related injunctions (~12.5% of 19,032 EJ injunctions). Same classifier runs on 2024.status update 2026-06-14: 50-case pilot on 2020 cases (
source/llm/pilot_poll_lawsuit.py,build/llm/lawsuit_pilot/summary.json) shows the suit content is overwhelmingly formal-compliance, not methodology: divulgation_violation 38%, registration_missing 38%, enquete_not_pesquisa 10%, sponsor_concealment 1, design_lever allegations 0/50. Paper §6 now cites this pilot as corroborating the synthesis ("Legal challenges target form, not methodology"). Implication for the sued-rate test: the direct "perceived bias" framing for use #1 below has to be rewritten — the sued set is mostly about whether the publication followed registration rules, not whether the underlying poll was tilted. The cleanest sub-test is whether sponsored polls are over-represented in the fraud-flavoured subtype (sponsor_concealment + enquete_not_pesquisa with fraud allegation), which is rare (~10% of sued set).three uses (per
summary.md"Complementary data"):- Perceived-bias validation: polls in the high-
SponsoredBy_ctail should be over-represented in the sued set — REVISED: restricted to fraud-flavoured subtypes only, since formal-compliance violations dominate the suit base rate (76%) and don't track methodological bias. Status 2026-06-16 (v1 + v2):- AN-072 (candidate-level): within race, cands with ≥1 self-sponsored poll are +3.15 pp more likely to be litigants in a fraud-flavored PESQUISA case (race-FE OLS, p=0.011, base 7.9 %).
- AN-072v2 (poll-level): regex-extracted protocol→case map from TREdiarios mov.text + registry intersection (~9.1% of 5,276 fraud cases yield a registry-matchable protocol). Within race × week, candidate-sponsored polls are −3.9 pp less likely to be sued for fraud (p=0.010, base 4.3 %). The perceived-bias prediction fails at the poll unit; the positive candidate-level signal is selection into contentious races where lawsuits target the other polls (independent/media). Coverage caveat: unmatched fraud cases coded sued=0 biases the estimate toward zero, so magnitude is likely understated.
- Implication for the paper: rewrite the §6 sued-rate framing. Use 1 no longer validates perceived bias; it actually contradicts it at the right unit of analysis. Sponsored polls draw less fraud litigation than peer independents, conditional on race × week.
- Open follow-ups: LLM extractor on the 91 % regex-missed fraud cases could lift coverage (assunto FRAUDULENTA cases where text is present but regex missed the protocol); use 3 (CNPJ join via pollster) is now the natural complement.
- Mechanism evidence: small LLM pass over the poll-lawsuit petitions to surface which dimension (methodology / sample / sponsor identity / timing) is alleged to be biased. Done on 50-case pilot; the answer is "almost always disclosure compliance, almost never methodology".
- Sponsor-side robustness: defendants in poll-lawsuits are often
named with CNPJ — join to
poll_sponsor_2024for a candidate-level "was sued for sponsored poll" indicator.
- Perceived-bias validation: polls in the high-
join:
projects/electoral-justice/build/merge/proc.csv(assuntoscontainsPESQUISA) → defendants/plaintiffs → CNPJ →pipelines/politica/build/clean/poll_sponsor_2024.parquet.dependency for 2024 sued-rate test: ~resolved 2026-06-16~. TREdiarios parse stage now covers all 27 UFs for 2024+2025. cand_proc_2024 (built 2026-06-12 off the diários DB) supplies the CPF↔case bridge. AN-072 used assunto_desc alone for the fraud cut (no LLM needed for v1) since the 2024 DataJud taxonomy is granular: FRAUDULENTA assuntos = 5,049 + IRREGULARIDADES PUBLICADOS = 172 = ~5,200 fraud cases of 12,451 PESQUISA cases.
non-blocking; only after the headline estimate lands.
created: 2026-06-01
Near-term: 2020-only descriptives on poll lawsuits (timeboxed)
- electoral-justice clean data currently only covers 2020, so the
2024 sponsor-join (use #3 above) is blocked. But we can already
do framing descriptives off the 2,376 2020 poll-related
injunctions: who sues (campaign / party / coalition / MP),
who is sued (pollster firm / sponsor / media outlet),
base rates by UF and by stage of the cycle, modal legal claim
(
assuntossub-categories + petition snippet), outcome distribution (procedente / improcedente / extinto), median time-to-decision. - useful for the paper's institutional-setup / "regulatory bite" paragraph even without the 2024 sample link.
- timebox: descriptive pass only; if it starts to look like a second empirical strand, stop and defer until EJ cleaning reaches 2024.
- created: 2026-06-02
- electoral-justice clean data currently only covers 2020, so the
2024 sponsor-join (use #3 above) is blocked. But we can already
do framing descriptives off the 2,376 2020 poll-related
injunctions: who sues (campaign / party / coalition / MP),
who is sued (pollster firm / sponsor / media outlet),
base rates by UF and by stage of the cycle, modal legal claim
(
50-case llmkit pilot — DONE 2026-06-02
- extractor:
pipelines/politica/source/llm/poll_lawsuit.py(schemas atpipelines/politica/source/llm/schemas.py, prompts underpipelines/politica/source/llm/prompts/). - pilot script:
projects/poll-sponsor-bias/source/llm/pilot_poll_lawsuit.py, output:projects/poll-sponsor-bias/build/llm/lawsuit_pilot/. - run: 50 cases, year=2020, seed=42, model=gpt-4o-mini, all valid.
- findings folded into
docs/design_levers.md§ "Dimensions surfaced in the 2020 poll-lawsuit pilot". Headline: lawsuit universe is registration-driven (66% registration_missing, 0 methodology_bias primary) — reframes use #1 of this entry.
- extractor:
Schema v3 — typology-framing for alleged_levers
- the v2 pilot showed the LLM reserves the
DesignLeverenum for cases where bias is the theory of harm, and pushes design-dimension allegations inside registration cases intolever_other_text. Rich free-text content, no countable incidence per lever. - v3 should split the field: keep
alleged_leversas "design bias as theory of harm" (sharp count of methodology-framed cases) AND adddimensions_touched: list[DesignLever]meaning "any allegation touches this design dimension, regardless of legal framing". Latter is what we want for the universe count. - non-blocking; only worth the second iteration after we decide to scale beyond 50.
- created: 2026-06-02
- the v2 pilot showed the LLM reserves the
Scale to full ~1.9k 2020 cases (decision point)
- after schema v3 lands. Cost ≈ $5 at gpt-4o-mini, $50 at gpt-4.1. Worth doing once we have countable per-lever incidence to put in the paper's framing section.
- created: 2026-06-02
Literature
Sync OA PDFs from sandbox to ~/Dropbox/literature/ (host shell)
- 19 PDFs in
docs/literature_search/_pdfs/(if still present) need to move to the shared pool so subsequent/literatureruns pick them up. - command (from the host, not the sandbox):
rsync -av --ignore-existing \ ~/research/projects/poll-sponsor-bias/_pdfs/ \ ~/Dropbox/literature/ - 4 highest-value reads:
meireles2022pesquisas,shiranimehr2018disentangling,gramacho2013margem,batistapereira2024pesquisas. - created: 2026-06-01
- 19 PDFs in
Download paywalled Brazilian papers on campus (Chrome
AlwaysOpenPdfExternally, see literature SKILL.md §8c)- Ranked by importance:
lloyd2016vote— vote buying & poll error in Brazil. Closest substantive predecessor for "polls aren't randomly off in Brazil".gramacho2015preelection— 2014 cycle, regulatory-debate context (DOI 10.3232/reb.2015.v2.n2.09).bertholini2022against— Brazilian presidential forecast baseline (DOI 10.14201/rlop.25882).mancuso2023money— Speck-authored, campaign finance (DOI 10.3390/socsci12120656).hunter2019bolsonaro— political context (DOI 10.1353/jod.2019.0005).
- non-Brazil priorities (methods sections):
destefano2022preelectoral(Italian Bayesian house effects),selb2023bias(multiparty bias-variance, German polls),leeper2019sponsorship(closest sponsor-bias predecessor),castrocornejo2024partisan(LatAm partisan-perception analog),schmittbeck2008polls(canonical polls-affect-votes German paper). - created: 2026-06-01
- Ranked by importance:
Web-search aftermath PDFs
batistapereira2024pesquisas— DONE (SciELO OA). High-priority read: same TSE-registered data, co-authored by Felipe Nunes (Quaest founder). Argues 2022 polls-vs-results gap is late voter-side change not polling error — important alternative explanation our sponsor-bias story needs to be distinguished from.cantu2016utility— IJPOR paywall (DOI 10.1093/ijpor/edv004). Most direct LatAm methodological predecessor — firm-level bias estimation in Mexican multiparty polls via Kalman filter. Grab on campus.- created: 2026-06-01
Data-quality validation
- Hand-validate a stratified sample of LLM vote-intention extractions
- context: the headline +7 pp identifies off
error = poll_percent - 100 * final_share, wherepoll_percentis LLM-extracted from the relatório PDF. Sponsored-poll PDFs are professionally laid out; independent-media PDFs are often noisier scans. Differential extraction error correlated with sponsor type would manufacture β without any real slant. The bulk_extraction_audit only reported overall extraction success rates, not by sponsor type. - protocol: pull 50 sponsored + 50 independent extractions stratified by pollster size and scenario_type. For each, compare LLM output against the candidate's reported percent on the PDF page (open the PDF, eyeball the row). Record: extraction correct / mis-read / wrong-scenario / OCR-truncated; magnitude and sign of any error. Tabulate error rate × direction × sponsor type.
- bar: if differential error rate between sponsored and independent is < 1 pp and sign-balanced, the headline is safe. If sponsored polls are systematically extracted with smaller error (cleaner PDFs) than independent polls (noisier PDFs), the comparator's error term is inflated downward → β over-stated.
- cost: ~3 hours manual at a steady pace; cheap and high-value.
- blocks: a clean "we have ruled out extraction-error as a confound" sentence in the paper's robustness section.
- created: 2026-06-02
- context: the headline +7 pp identifies off
Framing
Industry-insider framing — Felipe Nunes (Quaest founder)
- context: Nunes co-founded Quaest, one of the major Brazilian polling
firms, while continuing as an academic (UFMG, now FGV-EESP). His
batistapereira2024pesquisaspaper uses Quaest data on the same TSE registered-poll universe. Two implications: (a) the most academically engaged Brazilian pollster is not analyzing sponsor effects; (b) the paper intro needs a sentence acknowledging the polling industry has academic voices and we are not claiming malfeasance, only documenting an average bias. - todo: bake the framing into the eventual paper intro; mention in a footnote where Quaest polls appear in the analysis.
- created: 2026-06-01
- context: Nunes co-founded Quaest, one of the major Brazilian polling
firms, while continuing as an academic (UFMG, now FGV-EESP). His
Argument: reputation, not law, disciplines the main pollsters
- claim to develop for the paper: pesquisa-eleitoral cases in the Justiça Eleitoral effectively carry no legal punishment (fines are token, no firm has ever been shut down or barred from polling for biased filings, criminal exposure is essentially nil). And yet the major established pollsters in the data are not the ones generating the +6.7 pp self-sponsored slant — the bias is concentrated in less reputationally exposed firms (consistent with the firm-tier discipline pattern documented in AN-054 and the within-firm β results in AN-016). The implication is that the binding constraint on the major pollsters' honesty is reputational, not legal. If law were doing the work, no firm would deviate; if reputation is doing the work, only firms with reputational capital at risk would be disciplined — which is what we observe.
- what to develop:
- Document the no-legal-punishment premise. Pull the
Justiça Eleitoral case record on Lei 9.504/97 Art. 33
(pesquisa registration) and Art. 35 (penalties): how many
cases, what fines actually levied, any firm-level
sanctions ever imposed. The institutional-facts subsection
(
docs/institutions.md) is the place; if the answer is essentially "fines are nominal and rarely upheld," that is the claim to cite. - Cross-reference with the firm-tier finding. Tie the reputational reading to the AN-054 / AN-016 / AN-018 result that the +6.7 pp is not a firm-fixed-effect feature of the Datafolha/Quaest/Ipec tier. The paper-side framing is: the fact that some firms behave despite costless law is the evidence that reputation is the operative constraint.
- Connect to Felipe Nunes / Quaest framing already queued above. A major pollster founder being an active academic publishing on this same data is itself a reputational commitment device. Bake that into the same passage.
- Counter-arguments to address. (a) Selection — maybe
major pollsters just have better clients who don't ask for
slant. The customer-mix evidence (
docs/briefs/pollster_customer_mix.md) partially speaks to this. (b) Civil/contractual reputational loss to clients vs reputational loss to the public — worth distinguishing; the latter is what the argument turns on.
- Document the no-legal-punishment premise. Pull the
Justiça Eleitoral case record on Lei 9.504/97 Art. 33
(pesquisa registration) and Art. 35 (penalties): how many
cases, what fines actually levied, any firm-level
sanctions ever imposed. The institutional-facts subsection
(
- deliverable: a paragraph (eventually a subsection) in the
paper's mechanism / discussion, plus the supporting
institutional facts in
docs/institutions.md. - created: 2026-06-12
Cycle extension (low priority)
- Extend the analysis to the 2022 federal/state cycle
- context: the all-Brazil 2024 mayoral sample has 568 self-sponsored
candidate-poll rows across 33 institutes with ≥5 self-sponsored
polls each. The pollster-customer-mix test
(
docs/briefs/pollster_customer_mix.md) is underpowered at n=11 institutes after the SE-quality filter. Adding 2022 roughly doubles the pollster panel and — crucially — gives within-pollster-over-time variation that cleanly identifies whether a firm's β changes with customer-composition shifts (the causal counterpart to the cross-sectional sorting test). - priority: lower than Channel A vs B (the methodology-extractor decomposition is the project's main unfinished mechanism lever and is sample-size-independent). Do 2022 second.
- scope: focus on single-winner races where the within-candidate FE
transfers cleanly:
- Governor (27 races, state-level, runoff threshold 50%) — direct parallel to 2024 mayoral; the workhorse.
- President (1 race, national) — separate analysis, very poll-heavy cycle, the highest-stakes test of the design.
- Skip PR races (federal/state deputy, senator multi-seat) — different strategic structure, design doesn't transfer.
- pipeline steps:
- Pull 2022 registry CSVs + contratante/pagante zips from
bi-dropbox:data/TSE/2022/(assume same schema as 2024; verify). - Pull relatório PDFs (probably already in bi-dropbox under
pesquisa_eleitoral/relatorios/2022/). RefactorDone 2026-06-17:pipelines/politica/source/clean/poll_response_2024.pyandpoll_sponsor.pyto take year as a parameter (or create_2022siblings).poll_sponsor.pyis now year-parameterized (YEARS=[2020, 2024]); add2022to the list.poll_response_2024.pystays 2024-only (LLM-extraction source is unique); for 2022 either addpoll_response_2022.pysibling or generalize later. New upstreampoll.pyemitsbuild/clean/poll_{year}.parquet(TSE registry per protocol).- Bulk LLM extraction with the existing llmkit infrastructure (gpt-4o-mini): ~$15, ~5h at 8 workers if poll count is similar order of magnitude.
- Update
source/analysis/analysis_table.pyto handle multi-year panel: race_id = (year, uf, office) for governor/president; replace muni_id-based within-race FE with state-level within-race FE. - Re-run
source/analysis/heterogeneity.pyandsource/analysis/pollster_customer_mix.pyon the pooled panel; add a within-pollster-over-time spec to the customer-mix test.
- Pull 2022 registry CSVs + contratante/pagante zips from
- expected payoff:
- ~doubles n_self → tightens β by √2 ≈ 1.4× across all specs
- within-pollster panel → can identify the customer-mix slope causally (β_pollster_2024 − β_pollster_2022 vs Δ candidate_share)
- replication-across-cycles paragraph for the paper
- non-blocking; do after Channel A vs B lands and the headline paper structure is settled.
- created: 2026-06-02
- context: the all-Brazil 2024 mayoral sample has 568 self-sponsored
candidate-poll rows across 33 institutes with ≥5 self-sponsored
polls each. The pollster-customer-mix test
(
Leads from an-009-party-interaction.py — 2026-06-02
- Pollster × sponsor interaction Wald test (extension of AN-009).
Same test structure with
party_group→pollster_group(top-5 pollsters by sponsored-row count + OTHER). The substantive hypothesis: if a pollster-specific bias machine exists, the pollster-level joint test should reject where the party-level test failed. Suggested script:source/analysis/an-NNN-pollster-interaction.py(next available AN id). Low effort; reuses the AN-009 scaffold.
Leads from robustness_redteam.py — 2026-06-02
- Disclose drop_absorbed in the paper note. 8,205 / 8,431 candidates (97.3%) have no within-variation. The within-candidate FE identifies on 226 candidates / 1,311 rows. AN-010 K4 shows the race-FE-only refit recovers β = +8.00, so the FE selection isn't generating the result, but the disclosure belongs in the paper's specification section with a footnote citing AN-010.
- Compute media-only mean error baseline. AN-010 K1 leaves
the point estimate intact but does not separately quantify
pollster_self bias. Compute mean(
error) for pollster_self-only polls vs media-only polls, both on the all-Brazil sample. If pollster_self polls are biased (either direction), the +0.93 pp independent-baseline figure inbriefs/all_brazil_analysis.mdis mis-leading. Suggested: 10-line addition.
Leads from an-011-permutation-jackknife.py — 2026-06-02
- TWFE permutation null on spec-3c headline. AN-011's
permutation uses race-week FE only (β_obs = +4.68 in that spec);
the full headline +7.94 is candidate FE + race-week FE. Re-running
the permutation through
linearmodels.PanelOLSwith both FE dimensions would give a directly-comparable p-value to the headline. Cost ~25-30 s extra (500 fits × ~50 ms). Already-established conclusion (rejection extends a fortiori); completeness item. - Piauí mini-investigation (low priority). PI is the only state whose leave-one-out moves β by more than 0.5 pp (β=+7.42 vs baseline +7.98). Is PI's average sponsored-poll error genuinely larger, or is one PI candidate or pollster driving it? One bar plot of per-UF sponsor-poll error would settle it. Curiosity, not load-bearing.
Leads from an-012-spec-se-robustness.py — 2026-06-02
- TWFE wild-cluster bootstrap on the proper spec 3c
(candidate FE + race-week FE). AN-012's bootstrap uses the
race-week-FE-only spec (β=+4.68, manual within-cell partialled-out
OLS for speed). The headline +7.94 has candidate FE on top.
Implementing with
linearmodels.PanelOLSin the inner loop: ~50 ms per fit × 1000 reps ≈ 50 s. Returns a directly-comparable WCR p-value to the headline. Combined with the parked TWFE-permutation completion (AN-011 lead 1), gives full SE coverage at the paper's specification-3c level.
Leads from an-013-digit-frequency.py — 2026-06-02
- Channel-A regression decomposition once
extract_methodology.pylands. AN-013 rules out crude per-candidate post-fielding tampering as a digit-detectable failure mode, narrowing (but not pinning down) the mechanism toward whole-poll sample-design slant. Once the LLM methodology extractor's bulk run finishes (status: smoke-tested, queued in todo.md), re-estimate spec 3c with the extracted controls (coverage_class, quota_variables,population_reference, mode) layered on top of spec 2's methodology controls. The shrinkage of β across (Spec 2 → Spec 2 + Channel-A LLM features) is the load-bearing direct test of the design-driven channel. Suggested script:source/analysis/an-NNN-channel-a-controls.py. - Within-pollster digit comparison (extension, low
priority). AN-013's A-vs-B test pools across pollster firms.
Sharper version: condition on
institute, compute A-vs-B z-test per firm, then aggregate (random-effects meta-analysis or Stouffer's combined p). Catches the case where a small subset of firms fabricates while most don't. Per-firm n is thin (top firm has ~50–70 A rows) but tractable for the top-20 firms. Suggested script: add a function toan-013-digit-frequency.py.
Leads from an-014-denominator-audit.py — 2026-06-02
- Update
design_levers.md§ "Levers tested and ruled out". AN-014 falsifies "sponsors list fewer candidates → smaller renormalization denominator → mechanical share inflation" as a Channel-A lever. Add a one-line entry citing AN-014 with the decomposition: 98.6 % of the renormalized-vs-raw gap is mechanical multiplicative scaling, only 1.4 % could be denominator shift. - Forestall the LLM-extraction-precision confound footnote. AN-014 cell-level finds sponsored cells have larger mean denominators than independent cells. Plausible alternative explanation: sponsored-poll PDFs are professionally laid out and the LLM extracts a slightly larger candidate set from them. The within-candidate audit (which compares each candidate's own polls) is immune to this and gives the median-+0.15 null. Add a one-line footnote next to the cell-level table in the paper note acknowledging the audit-immune within-candidate test is the cleanest measurement.
Leads from an-015-data-quality.py — 2026-06-02
- Within-firm β on AN-007's customer-mix-sorting set
(highest paper-value extension). AN-015's within-firm test was
underpowered because top-5 pollsters by row count are large media
firms with ≤ 3 sponsored polls each. Refit spec 2 on each of the
11 institutes with ≥ 5 self-sponsored polls (the AN-007 set).
PDF style is held strictly fixed within firm. If β survives in
most firms (consistent with the +0.07 sd pollster-jackknife result
from AN-011), data-quality / PDF-style differences across firms
are explicitly ruled out. Suggested script:
source/analysis/an-NNN-within-firm-beta.py. Low effort — AN-015'swithin_firm_betafunction with a different firm list. - Diagnose the n_named control attenuation in spec 2 (puzzle). Adding n_named (number of matched candidates per cell) to spec 2 drops β by 1.5 pp (from +8.00 to +6.47). Hypotheses: (a) n_named correlates with cycle stage (early polls list many candidates → low race competitiveness signal not absorbed by candidate FE), (b) n_named correlates with race-size / N-viable-candidates (and AN-004 found β concentrated near the Cox cutoff), (c) extraction-quality artifact. Test: split β by n_named tertile, also report n_named correlation with race_margin and competitiveness measures from AN-004. Suggested: 20 lines added to AN-015 or a small AN-NNN.
Leads from an-016-within-firm-beta.py — 2026-06-02
- Prioritize hand-validation toward high-β firms (refinement of the existing hand-validation TODO). The hand-validation TODO should sample more aggressively from the AN-016 high-β firms (METHODUS, CAMARGO, INTENÇÃO, RADAR, DATA SC). If those firms' PDFs do look qualitatively different, that's interesting — conversely, sampling CENSUS / IIP / Verita where β ≈ 0 would confirm extraction is working as expected on the big firms.
Leads from an-018-firm-size-discipline.py — 2026-06-02
Industry-segmentation descriptive table for the paper introduction (extension). Three cuts the AN-018 result makes paper-relevant: (a) share of all sponsored polls produced by each tertile; (b) share of all races where the lone sponsored poll comes from a small-tertile firm; (c) candidate-side concentration: which candidates / parties preferentially commission from small-tertile firms? Together these give a sharp "where is the bias concentrated" table for the paper's institutional-setup section.
Media filtering as the amplifier of the reputation-by- volume mechanism (extension, high paper-value). AN-018 shows big firms self-discipline; the natural next question is why the reputation incentive is so sharp. Hypothesis: media outlets preferentially cover polls from trusted, high-volume pollsters, so the polls that actually reach voters are filtered by pollster reputation. Two implications: (a) the filter amplifies the reputation incentive — a big firm's slant has higher visibility cost because their polls are more likely to be reported on; (b) sponsors who want visibility face an incentive-aligned trade- off — pay for a credible (big) pollster to get media play, but accept that the pollster won't slant heavily. Test pathways: (i) cheap proxy in-data — for each pollster, compute the
share_media_only(already in customer_mix_refresh.csv) and cross with firm size; do big firms specialize in media work, and how strongly does media share predict reputation discipline above and beyond size? (ii) external — scrape news mentions of pollster names (Folha, Estadão, Globo, UOL search APIs or a Google News history crawl) over the 2024 cycle; regress per-pollster media- mention count on firm volume. (iii) downstream — for the subset of polls with known media coverage, refit β separately on "media-amplified" vs "media-ignored" polls. Effect on voter behavior is plausibly larger for the former; if so, β-on-final-share becomes a downward-biased measure of the electorally consequential slant. Connects to: theory.md § "Why bias survives voter discounting (information frictions)" (media as voter-side filter); theory.md § "Pollster reputation: volume vs customer mix" (supply-side reputation-by-volume); AN-018 (the headline size-discipline result). Suggested first script: a 50-line cross-tabulation using the data already in hand; the external news-scrape is the longer follow-up.Sharpened within-race design (2026-06-02). The ideal test of test pathway (iii) is a within-race contrast: in races where we have ≥1 candidate-sponsored poll from a low-tier (large, low-β) firm AND ≥1 candidate-sponsored poll from a high-tier (small, high-β) firm, compare media coverage of the two polls. The same- race comparison holds race salience and candidate prominence fixed, so any coverage gap is attributable to pollster reputation filtering, not the underlying race interest. AN-018's tertile classification supplies the tier definition (large: β≈0; small: β≈+12). Workflow:
- Scoping: find races with both-tier candidate-sponsored polls. Output: race-list + per-pair pollster + protocol IDs + candidate name + field dates.
- Coverage proxy: per poll, query Google News (or a search
API with a free tier) for
"<pollster>" "<candidate>" pesquisa <muni>constrained to the cycle window. Record hit count + top URLs. - Within-pair comparison: paired t-test on log(1 + hits)
across pairs. Sign should be positive (low-tier polls get more
coverage).
Implementation note: Google's free Custom Search Engine allows
100 queries/day, enough to cover a 30-50-race comparison set in
a week of incremental queries. Brave Search API has a free tier
too. DuckDuckGo can be scraped for prototype. Don't commit to
the AN script yet — start with the scoping query and inspect the
candidate-race set to see if the comparison has enough power.
Suggested scoping script:
source/analysis/_media_filter_scoping.py(underscore-prefixed, exploratory).
Pilot 2026-06-02 (scoping + Google News scrape) found the strict large-vs-small comparison set is empirically too thin. 6 races / 7 pairs (strict tier contrast). Per-poll Google News RSS queries on (pollster + candidate-surname + muni) yielded 0-1 hits in all but one case; the pattern across non-tie pairs runs opposite the predicted direction (3 of 4 non-ties have the small-firm poll more covered than the large-firm poll). Three diagnoses: (i) small munis have near-zero media coverage at the per-poll level; (ii) pollster-name distinguishability varies drastically (IIP=3-letter initialism → 1 broad hit; 3S Consultoria → 25-hit cap); (iii) small-firm coverage is plausibly controversy- driven (Procuradoria-investigation-of-Verita articles surfaced in the diagnostic), not credibility-driven. Per-poll Google-News proxy is not the right operationalization at this sample size.
Next-pass options (ordered by leverage): (a) Bigger comparison set: expand AN-018's tier classification to include untiered firms (firms with <5 self-sponsored polls, 182 sponsored protocols) by analogy with the small tertile; then look for races with a large-tier firm + ANY non-large firm sponsoring. Should multiply the sample 5-10×. (b) Bigger munis: AN-007's customer-mix set has firms doing polls in capital-city and major-metro races. Find candidate- sponsored polls in those races (Belo Horizonte, Curitiba, Porto Alegre, Recife, etc.) with tier contrast — coverage is thicker, the proxy is more discriminating. (c) Different proxy: per-pollster total media salience over the cycle (not per-poll). Regress per-pollster total Google News mentions on log(firm volume) and on within-firm β. This decouples the test from per-poll-pair power. A pollster's reputation FILTER is a firm-level property; testing it at firm level may be the natural unit. (d) Wikipedia / structured media outlet lists: Wikipedia mentions of pollster names + appearance in the pollster directory at PolitOlhar / O Antagonista / Veja political pages. More signal for the high-volume firms. (e) Cross-cycle (2022 federal): federal-level polls are more heavily reported; tier contrast in presidential / gubernatorial polling has more media-coverage variance to work with.
Pilot artifacts:
source/analysis/_media_filter_scoping.py,source/analysis/_media_filter_scrape.py,build/table/_media_filter_scoping.csv(13 polls),build/table/_media_filter_hits.csv(13 polls × 3 queries each with hit counts + sample URLs).
Leads from an-026-rank-selection-and-bias.py — 2026-06-02
Rank-at-commissioning-time (highest paper value). AN-026's selection result uses
final_rankrather than rank at the time the candidate's self-sponsored poll was commissioned. A candidate who was leading early and finished second is labeledrank-2. Reconstruct contemporaneous rank from the most-recent prior independent poll (same logic as the placebo) and re-run both halves. This sharpens the selection test directly — the prediction is naturally about rank at commissioning, not at outcome. Suggested script:source/analysis/an-NNN-rank-at-commission.py.Resourcing as the selection driver. The "winners are well-resourced campaigns" interpretation can be tested: control for
valor_recebido(campaign-finance revenue) in the selection logit and see whether rank-1 over-commissioning survives. If it shrinks, the rank pattern is really about money rather than rank per se.2022 cycle pooled extension for AN-026. Competitiveness stratification in AN-026 has thin cells (190-row tight tertile); the sharper "leaders in close races still slant" test is uninformative at current power. Adding 2022 polls (presidential + state-level) approximately doubles n.
Safe-race puzzle for the longer paper. AN-026 finds rank-1's self-share is HIGHER in safe races (+41 pp) than in close races (+21 pp) — opposite of the resource-constrained strategic logic. Candidate stylized fact for the longer paper: in safe races the leader's polls serve donor/GOTV functions with no strategic-voting purpose, so all sponsorship flows from resource-richness rather than competitive need.
Leads from an-027-rank-at-commission.py — 2026-06-02
Registration-date alignment (extension of AN-027). Use the TSE registration date (DT_REGISTRO from the
pesquisa_eleitoral_2024metadata) as the literal commissioning moment instead of the prior-neutral-poll proxy. Should tighten the rank-at-commissioning measurement (the placebo gap is median 10 days; the registration date is the actual moment of commissioning). Possibly widens coverage too if the registration date precedes the field date enough to include polls without a prior neutral poll in the race. Suggested script:source/analysis/an-NNN-registration-date-rank.py.Money-controlled selection (AN-027 two-mechanism test). Regress the self-sponsoring indicator on
final_rank × margin_tertileplus campaign-finance revenue (valor_recebido). If the safe-race rank-1 effect shrinks under money controls but the tight-race rank-2 effect does not, the two-mechanism interpretation (coordination demand in close races- resourcing in safe races) is directly supported. Needs the
campaign-finance linkage from
pipelines/politica/build/clean/.
- resourcing in safe races) is directly supported. Needs the
campaign-finance linkage from
Refine
coordination-peakhypothesis statement. AN-027 shows the prediction holds in tight races but not in safe races. The hypothesis page should be updated to make the race-competitiveness conditioning explicit ("coordination demand drives runner-up over-commissioning in tight races" not as a pooled cross-rank statement).
Leads from an-029-money-controlled-selection.py — 2026-06-02
Rank-at-commission + money control (extension of AN-029, highest paper value). Re-run AN-029 using rank-at-commission (from AN-028's date_start anchor) instead of final_rank. The tight-race rank-2 over-commissioning that AN-027 documented under rank-at-commission should survive money control if the coordination story holds — AN-029 hasn't tested that directly yet. Suggested script:
source/analysis/an-NNN-rank-at-commission-money.py.Threshold-based money control (extension). Replace log(receita) with an indicator for "revenue above the polling-affordability threshold" (~R$50k or ~R$100k). If polls have fixed costs, the relevant question is whether the candidate crossed the threshold, not the continuous log-revenue slope. Could change the AN-029 null on the resourcing channel.
Alternative resourcing proxies (blind spot). Try paid-staff count, advertising spend, in-kind contributions, and party-transfer share as alternative "resourcing" measures. If any attenuates the rank gradient where log(receita) doesn't, the AN-027 resourcing interpretation can be saved with the correct operationalization.
Propose decisions.md entry (framing update). AN-029 is a null on the literal money-resourcing interpretation. If the companion paper's selection-mechanism section becomes load-bearing on this, propose a decisions.md entry to drop "resourcing" as a stand-alone label and replace with "safe-race rank-1 over-commissioning is an unexplained selection fact that future work needs to mechanize." Defer until the companion-paper draft.
Leads from an-030-rank-at-commission-money.py — 2026-06-02
Propose decisions.md entry — drop 'resourcing' label (companion-paper framing). AN-029 (final-rank) + AN-030 (rank-at- commission) jointly refute the literal money-resourcing interpretation of safe-race rank-1 over-commissioning. The companion paper's selection-mechanism section should re-frame: tight races = coordination demand (supported); safe races = unexplained-but-real selection fact, NOT "resourcing." Defer the actual decisions.md write until the companion-paper section is being drafted.
2022-cycle pooling for the coordination signal (highest paper power gain). AN-030's tight-race rank-2 coefficient is +1.81 pp at p=0.10 — direction is right but n_self=32 is the binding constraint. Pooling 2022 presidential + gubernatorial polls would tighten the inference and possibly cross conventional significance. Suggested script:
source/analysis/an-NNN-2022-pooled-coordination.py.Within-muni vs cross-muni money decomposition (puzzle). AN-030's no-FE spec hands log(receita) +0.27 pp / log-unit (p=0.001); race-FE shrinks it to +0.08 (p=0.35). Some money signal exists across munis that disappears within muni. Worth understanding: are higher-revenue munis more poll-active, or do higher-revenue candidates concentrate in poll-active munis? A muni-level aggregate with muni-revenue and muni-poll-count would clarify.
Refine
coordination-peakhypothesis statement inhypotheses.md. AN-030 settles the prediction: coordination operates only in competitive races, under rank-at-commission framing, independent of campaign-finance revenue. The hypothesis page should reflect this triplet condition. Defer until the companion paper's draft is settled.
Leads from an-033-deferral-bias-interaction.py — 2026-06-02
AN-034 —
sp_x_def × final_rankheterogeneity (extension, highest paper power gain). AN-033's null γ is on a pooled average. Leader polls have more room for coverage-style suppression than trailing-candidate polls; deferred-and-sponsored polls might amplify bias selectively when the sponsor's candidate is rank-1 in the final outcome (or rank-at-commission). Test:error ~ sponsored × deferred × I(final_rank=1) | candidate FE. If positive in rank-1, the lever pulls only conditionally. Suggested script:source/analysis/an-034-deferral-rank-het.py.Re-use the AN-033 PanelOLS rig for the other two surviving levers (extension). Replace
deferredwithmixed_population(needs universe-scale sampling extraction — held under the main forward task) andurban_only_resolved(needs bairro-detail batch). Same Spec 1-4 ladder, same cluster-robust SE pattern. These are the next two probes into the residual concrete-lever space.op_x_defselection puzzle (puzzle, low priority). Pooled OLS givesopponent_sponsored × deferred= +3.61pp at p<0.001 (Spec 1) but the coefficient vanishes under candidate FE (Spec 2: −0.08, n.s.; Spec 3: +1.37, n.s.). The cross-candidate signal points to a covariation between opponent sponsorship and deferral selection. Diagnostic only — not paper-load-bearing on the headline but worth characterizing if the customer-mix line becomes load-bearing.
Leads from an-040-deferral-rank-heterogeneity.py — 2026-06-02
AN-NNN — deferral-amplification heterogeneity by surviving levers (extension, held under main forward task). Re-run AN-040's 3-way structure on
mixed_populationandurban_only_resolvedwhen those become universe-scale available from the held methodology batch.AN-040 rank-3+ small-cell puzzle (puzzle, low priority). Split-sample
sp × deffor rank ≥3 is +11.4pp (SE 7.4, p=0.12). Large but noisy. Worth checking whether a small handful of high-leverage rank3+ sponsored polls drives this — could be a data-quality flag or a tiny real effect.
Leads from an-041-mode-by-sponsor.py — 2026-06-02
Pollster-tier check on the 35 differing-mode pairs (extension, low cost). AN-041 surfaced 0/244 sponsored phone-mode vs 24/244 independent — but those 24 phone-mode independents may cluster in a few pollster firms (AtlasIntel IVR style). If so the whole mode contrast is a firm-tier story, not a sponsor-choice story. Tabulate
pollster_cnpj× differing-pair flag onbuild/table/an-041-mode-by-sponsor__differing_pairs.csv. ~20 lines.Mode at full-universe scale (puzzle/extension, blocked on main forward task). The 0% phone-mode-on-sponsored signal is striking enough that we want to confirm it holds at the ~14k-poll universe scale, not just the 244-pair curated subset. Falls out of the methodology batch run when the hold releases.
Leads from an-042-interviewer-training-by-sponsor.py — 2026-06-02
Training-details depth × sponsor (extension). AN-042's binary
_describedfield flips opposite to opacity (sponsored 84% vs ind 73%). The free-text_detailsfield shows sponsored polls cluster heavily on a few boilerplate strings ("treinamento em pesquisas de opinião pública": 44 sponsored vs 19 independent on the most common string). A length / specificity contrast on the details text could sharpen the strategic-disclosure story: do sponsored polls write longer or more specific training descriptions, or are they repeating boilerplate? Suggested script:source/analysis/_an-042-training-details-depth.py.Selective-disclosure systematic check (extension). AN-042 surfaces a sign-flipping pattern: sponsored polls under-document coverage (AN-024) and audit % (AN-021), but over-document interviewer training (AN-042). Is this visible-rigor-vs-sample-shape split systematic across all the other binary
_describedfields in the operations / sampling schemas (sampling cluster usage, weighting described, audit procedure described)? Run McNemar on each binary field on the 244-pair sample. Suggested script:source/analysis/an-NNN-operations-fields-paired-marginals.py.Add a "selective disclosure" subsection to source-of-bias.md (writeup, no script). The doc's TL;DR currently treats opacity as a uniform sponsored-side deficit. After AN-042 the directional table is updated, but the prose framing in §"Opacity differences" hypotheses-(1)/(2) split could be sharpened to make the selective- disclosure refinement explicit. ~10 lines of writeup.
Leads from an-045-sponsor-bias-by-rank-margin.py — 2026-06-02
- AN-NNN — supply-side bandwagon hypothesis page
(extension). If rank-1-in-tight survives the rank-at-commission
swap, write a supply-side hypothesis page parallel to
hypotheses/coordination-peak.md. Currentbandwagon-peak.mdis voter-side; the supply-side parallel ("leaders in tight races over-state to manufacture inevitability") is distinct from the rank-2 coordination story and merits its own ledger entry.
Leads from an-051-questionnaire-rotation-by-sponsor.py — 2026-06-02
Universe-scale questionnaire extraction (extension, needs API budget). Confirm the 5.4 % vs 26.1 % gap at full universe. The extractor is already in production-shape; running on the ~14k mayoral poll universe is ~$50-100 at gpt-4o-mini (or ~$25 with Batch API). Would give a regression-grade carrier test with statistical power on a ~10× larger sample.
Promote PollQuestionario into the production extraction orchestrator (infra). The schema lives in
pipelines/politica/source/llm/schemas.pybut is not yet wired intoextract_methodology.py/extract_methodology_batch.py. When the universe-scale run is approved, add it there alongside the existing methodology / coverage / operations / sampling tasks.AN-045 rank-3+ thin-cell skepticism (puzzle, low priority). Sponsor effect at rank-3+ non-tight (+10.91, n=14) and tight (+15.56, n=51) is large but small-n. Check whether 1-2 high-leverage candidate-polls drive these — could be a data quality flag or a tiny real effect.
Leads from an-043-nonresponse-handling-by-sponsor.py — 2026-06-02
Spot-read the 5 Joint A∧B sponsored polls (diagnostic puzzle, low cost). McNemar p = 0.06 with b=5 / c=0 is the one non-trivial sponsor-side direction in AN-043. Are these 5 genuine mentions of undecided-redistribution rules, or false-positive keyword collisions (e.g.
recusain a different context +proporcionalfrom PPS sampling)? Open thes_sampling__sample_design_evidence/s_operations__*text fields for the 5 protocol ids and check by hand. ~10 min, no script needed — just a python one-liner.Question-order / name-rotation regex on the 244 pairs (extension, cheap preview before the questionario-mirror probe). AN-043's free-text grep approach scales naturally to a regex on
ordem,rotacion,aleatoriz,randomiz— gives a preliminary sponsor-vs-independent contrast on question-order/name-rotation vocabulary before the proper questionario_pesquisa PDF extraction unblocks. Same script template as AN-043.Relatório-PDF LLM extraction pipeline (queued) (big work, source-of-bias probe items 1 & 4). Already added to the main todo body above as a separate item — the relatório PDFs are the natural home for both
nonresponse_handling(item 1) andweighting / post-stratification(item 4) since both are post-fielding analytical choices.
Leads from an-050-bias-by-rank-at-commission-margin.py — 2026-06-02
AN-NNN — rac-3+ viability-grab thin-cell triage (puzzle, highest follow-up value). AN-050's biggest specific finding (+18.64pp sponsor effect at rac3+ tight, n=14 sp) needs replication before it can be paper-load-bearing. Steps: (a) tabulate the 14 sponsored polls' candidates — verify not 1-2 high-leverage outliers; (b) repeat with stricter rac-3+ definition (rac ∈ {3, 4} only, excluding the long tail); (c) rac3+ × non-tight (+2.59pp, n=12) as a placebo — absence there strengthens the tight-race viability-grab reading. Suggested script:
source/analysis/an-NNN-rac3plus-viability-grab.py.AN-NNN — AN-050 anchor sensitivity (extension, robustness). AN-028 showed the rank-at-commission signal is robust across date_start / date_registered / date_end. AN-050 should be similarly anchor-robust. Suggested script:
source/analysis/an-NNN-rac-margin-anchor-sensitivity.py.AN-NNN — viability-grab supply-side hypothesis page (extension, IF rac-3+ survives triage). Currently the supply-side mechanism inventory has bandwagon-peak.md (leader amplification). A complementary viability-grab page would document the trailing-candidate-into-top-2 pattern. Parallel to coordination-peak.md (voter-side) but on the supply side.
Leads from an-071-accuracy-vs-bias-by-firm.py — 2026-06-15
- AN-072: Visibility-weighted accuracy vs bias. AN-071 found no correlation between per-firm accuracy MAE (on all non-self- sponsored polls) and within-firm β. But the reputational signal that disciplines bias is the accuracy that's actually observed — i.e., the firm's accuracy on its media-amplified work. Re-run AN-071 restricting the accuracy denominator to the firm's polls in above-median-media-coverage races (per AN-025's race-media index). If accuracy still doesn't correlate with β under this more visibility-weighted denominator, the per-firm-scorecard intervention in §sec:policy is well-motivated prospectively (it creates a signal that doesn't exist now); the null is informative either way. ~1 hour. (lead from AN-071)
Leads from an-073-firm-party-specialization.py — 2026-06-16
AN-NNN: Extend within-firm β estimation to the n_sponsored ≥ 5 sample. AN-016 used n_sponsored ≥ 10, giving 22 firms with usable β. AN-073's HHI sample is 38 firms (n ≥ 5). Estimating β on the lower cut would push the AN-073 HHI × β cross-cut out of the underpowered range (n=22 → n≈38) and is needed before treating the AN-073 null on M1/M3 as settled. Direct re-run of AN-016's per-firm within-firm estimator with the lower threshold. ~30 min. (lead from AN-073)
AN-NNN: Two-axis specialization map (party-HHI × state-HHI). State-HHI is universal in AN-073 (37/38 firms); party-HHI is uncommon (10/38). The unified picture of "regional only", "regional + partisan", "national" is more informative than either axis alone. Scatter with β as color. Visual companion to the AN-073 hhi_distributions figure. ~1 hour. (lead from AN-073)
AN-NNN: Post-election municipal-contract follow-on (M4 direct test). AN-073's null on the M1/M3 partisan-dyad prediction strengthens the M4 / quid-pro-quo case by elimination. Match the 2024 pollster CNPJ list against municipal contracts for 2025–26. Two cuts: (i) do pollsters whose candidate-clients won get more 2025–26 municipal contracting revenue than those whose clients lost? (ii) is that concentration in the same munis they polled? A positive sign on either is direct evidence the pollster was a stakeholder in the candidate's win, not just a contracted producer. Data note: the
projects/procure/repo has São Paulo municipal procurement data — a SP-only first cut is feasible without new ingestion if we mount that data on this machine (it is not currently here). Full-Brazil version would need Portal da Transparência scraping. Defer to a separate session that has procure/ data available, or run the SP slice as a feasibility pilot. (lead from AN-073 + enforcement-puzzle doc)Cross-cycle individual-candidate dyad analysis. AN-073 tests firm × party dyads and finds null. Firm × individual candidate (or firm × candidate's marqueteiro) dyads remain consistent with M1/M3 — needs 2020 sponsor data joined to 2024. Higher data lift than the within-cycle test but it's the sharper relational discriminator. Park as the natural extension once 2020 sponsor data lands. (lead from AN-073)
Leads from an-074-cpf-repeat-dyad.py — 2026-06-16
AN-NNN: Decompose AN-006's CPF +19 by repeat-vs-singleton subset. Load-bearing — distinguishes M1-individual (durable dyads carry the slant) from M4 (singletons carry the slant). Refit the AN-006 spec (
error ~ sponsored × route_dummies | candidate FE + pollster FE) restricted to (a) singleton CPF dyads only, (b) repeat-dyad CPF protocols only. If singleton β ≫ repeat β, M4 wins; if repeat β ≥ singleton β, M1-individual wins. ~30 min, re-uses AN-006 / heterogeneity.py logic. Suggested script:source/analysis/an-NNN-cpf-beta-by-dyad- multiplicity.py. (lead from AN-074)Date-field coverage by sponsor route (data-quality diagnostic). 6 of 7 CPF repeat dyads in AN-074 have missing
date_registeredon at least one side. Tabulate the date_registered / date_end / date_start missingness rate across the full sponsor parquet by sponsor_route. If CPF-route is systematically more missing, it's a route-specific registration- form artifact worth understanding (also undercuts any AN-073/074 follow-up that wants the time-gap diagnostic). ~30 min. (lead from AN-074)CPF repeat-dyad β on high-β firms. AN-074 shows the 5 firms with CPF repeat dyads (AR7, CENSUS, EQUACAO, MARIO ELISIO, PAULISTA JUNIOR) are not in the AN-016 high-β tail. Inverse cut: among the AN-016 high-β small firms (METHODUS, CAMARGO E MEDINA, BRASLOPES, etc., per AN-073 low-HHI / high-β cell), tabulate CPF-route repeat-dyad structure. If those firms have zero CPF repeat dyads, the M4 reading of the +19 gains by direct evidence (the high-slant CPF firms have no relational individual-dyad structure). ~30 min. (lead from AN-074)
Leads from an-078-disclosure-rate-by-route-dyad.py — 2026-06-16
AN-NNN: Refine disclosure indicator into PDF-filed vs extraction-success. AN-078 used "in poll_2024" as the disclosure indicator, conflating "campaign withheld" (the M5 signal) with "PDF filed but extraction failed" (data quality). Relatório PDF list is at
bi-dropbox:data/TSE/2024/pesquisa_ eleitoral/relatorios/. Refining the indicator to "PDF-filed-with-TSE" cleanly isolates the M5 signal — would push the committee gap p-value from 0.12 closer to clean significance if the bulk of the gap is campaign withholding. ~1 hour. (lead from AN-078)AN-NNN: Party-name reversed gap (+7.4 pp) descriptive write-up. The party-name route showed repeats more disclosed than singletons — the only positive M1-individual signal in the data. Party-name repeats are the "purest" relational case (AN-074: 87 % > 30 d, n=69). Which firms, which candidates, are these dyads concentrated in any AN-016 high-β firms? One-page descriptive. ~30 min. (lead from AN-078)
AN-NNN: Selection-corrected β bounds for committee repeat dyads. If M5 is operating, the +14.17 pp β on disclosed committee_repeat polls is upward-biased relative to the full universe. Construct Lee bounds or Imbens-Manski bounds: assume the non-disclosed committee_repeat polls (36 protocols) had β = 0 or β = committee average (~7 pp), recompute. Quantifies how much of the gap M5 can explain. ~1 hour. (lead from AN-078)
Leads from an-079-cost-by-sponsor.py — 2026-06-16
AN-NNN: Decompose the +14.8 % sponsorship cost premium (production-cost vs other reading). Spec 1's premium is precise but its origin is not pinned down. Cross-tab the premium with field-period length, sample size, and the heuristic tracking-contract flag (already on todo from AN-077). ~30 min. (lead from AN-079)
AN-NNN: Cost × sponsor route. Does the +14.8 % premium vary across CPF / committee / party / party-name? Add route dummies to Spec 1. If concentrated in one route, that further refines the production-cost vs menu-price reading. ~15 min. (lead from AN-079)
AN-NNN: Cost × firm-size tertile. Split Spec 1 by AN-018 firm-size tertile. If small firms charge a premium but large firms don't (or vice versa), informs the production-cost-vs-slant-fee interpretation. ~15 min. (lead from AN-079)
AN-NNN: Cost distribution in the undisclosed-protocol subset. value_brl is 100 % populated on the 9,509 disclosed protocols, but ~5,400 polls were registered-not-disclosed. If undisclosed CPF-route polls were systematically cheaper or higher-cost, that would re-open the menu-pricing question on the M5-suspect (selectively-published) subset. Needs the pre-disclosure registration data. ~1 hour. (lead from AN-079)
AN-NNN: Heuristic tracking-contract flag. TSE registry doesn't directly flag tracking contracts. Heuristic: a contratante registering ≥ 3 polls with the same firm within 60 days for the same candidate's race. Build the flag and tabulate route-by-flag. Useful for AN-077 reading and for future analyses on the M5 / contract-structure mechanism. ~30 min. (lead from AN-077)
Paper §sec:policy update — M5 / publication option as a candidate intervention target. AN-077's committee gap, if M5-driven, points policy in a different direction than the current sponsor-aware-disclosure framing: mandate publication of all registered polls within X days of disclosure date, or prohibit selective publication within a tracking contract. Writing item. Bring up alongside the full enforcement-puzzle doc rewrite. (lead from AN-077)
Why do repeat-CPF-dyad candidates have higher baseline errors? AN-076 implies candidate FE absorbs the raw +20.2 vs +3.5 pp gap, i.e. these candidates have systematically higher errors across ALL their polls. Race / candidate profile diagnostic — are they in hard-to-poll races (informal sector, high mobility), with disorganized committees, or something else? One-page descriptive write-up. ~30 min. (lead from AN-076)
AN-NNN: All-routes repeat-vs-singleton decomposition. Apply the AN-075 split to committee (n=359), party (n=32), party-name (n=117) — much more sample. If singleton β << repeat β on the larger routes, the M1-individual mechanism is real but AN-075 was just too thin to detect at CPF. If both β stay similar across routes, the route-uniform / strategic-stake reading generalizes. Cheap re-run of AN-075 script with expanded treatment-dummy split. ~30 min. Suggested script:
source/analysis/an-NNN-all-routes-repeat-singleton.py. (lead from AN-075)Descriptive write-up of CENSUS muni 12017 / candidate 5897300. The only "same firm × same race × same candidate polled twice" CPF observation in our data. Both polls slanted (+20.0, +10.1). What was the time gap, the field-period overlap, the coverage class? A single-paragraph case description is one tiny diagnostic on the M1-individual mechanism with no regression machinery. ~30 min. (lead from AN-075)
Paper §sec:policy implications under M4-leaning reading. AN-075's CPF-uniform reading (if it survives AN-076) shifts the policy interpretation: the slant is priced into individual- paid polls without durable relational structure. Disclosure- regime intervention has different marginal product if the mechanism is "personal-stake premium" rather than "durable relational trust." Writing item, not analysis. Bring up after AN-076 lands. (lead from AN-075)
Leads from an-108-sampling-se-detectability.py — 2026-06-19
- Channel-A magnitude bound on the median sponsored poll. AN-108's interpretation says a Channel-A-only story now requires per-poll design tilts averaging ≈ 7 pp. The existing source-of- bias.md probes (coverage, sampling, operations) bound documented levers at ~4.6 pp + 2 pp on average — not necessarily on the median sponsored poll. Open task: enumerate the lever combinations available on a single sponsored poll's documented methodology and bound the maximum within-poll Channel A magnitude. If the bound is < ~5 pp on a typical sponsored poll, the Channel-B residual must close more than the size-mismatch paragraph currently implies. Methodology / writing item. (lead from AN-108)
Leads from an-109-per-poll-z-blind-audit.py — 2026-06-19
AN-NNN: Sample-selection diagnostic — AN-109 vs AN-108 universe. Only 114/450 sponsored protocols carry through to AN-109 (those with same-candidate × race-week independent comparators). Check whether the carry-through subset differs systematically from the AN-108 universe on n, poll_percent, route, firm size, race competitiveness. If the subset is biased toward higher-profile races, the +13–19 pp excess detection in AN-109 may not generalize. ~30 min. (lead from AN-109)
AN-NNN: Which candidate fails the blind audit on a sponsored protocol? AN-109 protocol-level "detected" flag fires when any candidate's |z| exceeds threshold. Decompose: among sponsored protocols that fail, is the failing candidate disproportionately the sponsor's? If yes, the +15 pp excess is targeted — strong Channel-B-leaning signal. If random, the excess is residual cross-poll noise from the sample-selection caveat, not bias. Suggested file:
an-NNN-which-candidate-fails-blind-audit.py. ~1 hour. (lead from AN-109)§sec:policy — calibrated-detection-threshold framing. AN-109 establishes that a simple binomial blind audit is not a practical policy tool (FP > 30 %). A regulatory blind audit would need (i) empirical-DEFF calibration, (ii) a baseline from non-sponsored polls in the regulatory window, (iii) an excess- detection-rate alarm rather than per-poll. Add a paragraph to §sec:policy distinguishing "the sponsor's pollster knows" from "TSE could audit blindly" — the two are different problems. Writing item. (lead from AN-109)
Leads from an-110-empirical-deff.py — 2026-06-19
AN-NNN: DEFF* decomposition into pure sampling vs drift vs firm/methodology. AN-110's pooled DEFF* = 12.59 absorbs four sources of variance — (a) pure sampling DEFF, (b) within-week share drift, (c) firm-mode-methodology heterogeneity, (d) firm- systematic bias (AN-016). Approach: cells with multiple polls of the same firm in the same race × week isolate sampling DEFF (a); polls separated by 1 vs 7 days within a cell isolate drift (b); cross-firm same-day cells isolate (c)+(d). Likely needs day-level data not all present in
cand_poll.parquet. ~3 hours. Suggested file:an-NNN-deff-decomposition.py. (lead from AN-110)AN-NNN: Why is early-week DEFF* 2.3× the late-week DEFF*? AN-110 sensitivity: early weeks (<W38) → pooled DEFF* 23.3; late weeks (≥W38) → 10.1. Real share drift dominating early in the cycle is the natural read, but firm-mix changing across weeks is the alternative. Cross-check with AN-073 firm-week distribution and with the headline regression's
days_to_electionpolynomial. ~1 hour. Suggested file:an-NNN-deff-by-week.py. (lead from AN-110)Paper §sec:caveats — empirical-noise-floor caveat paragraph. AN-108's "per-poll loud" claim and any related paper prose need a comparator-explicit qualifier. The +7 pp is loud against binomial / truth; not loud against the empirical cross- firm distribution. Once the AN-110-lead-#1 robustness check lands, write a paragraph documenting the empirical noise floor and the SE inflation, so a referee asking "what about the empirical DEFF" has the answer in the text. (writing item; lead from AN-110)
Leads from an-111-headline-robustness-empirical-noise.py — 2026-06-19
AN-001 / current-panel β discrepancy. AN-111 Spec 2 gives β = +6.86 pp; AN-001 cites +7.75 pp. Likely panel updates since AN-001 (more polls in cand_poll, possible sample-restriction shifts). Decompose into (a) added polls, (b) revised cleaners, (c) sample-restriction changes. Not load-bearing — AN-111's robustness conclusion holds either way — but the headline finding's quoted magnitude should be reconciled with the current panel. ~30 min. (lead from AN-111)
AN-NNN: Spec 3c window sensitivity. AN-111 ran on 286 rows from 46 race-week cells. The strict spec is robust at t=3 but the sample is thin; a 1.5–2× expansion via two-week-window pairing would tighten inference and surface heterogeneity. ~1.5 hours. Suggested file:
an-NNN-spec3c-window-sensitivity.py. (lead from AN-111)
Leads from an-109 v2 — 2026-06-19
AN-NNN: AN-109 v2 with slice-conditional empirical DEFF*. AN-109 v2 uses pooled empirical DEFF* = 12.59. AN-110 surfaced meaningful sensitivity by slice: early weeks (<W38) DEFF* = 23.3, late weeks 10.1; sample-size tertiles also differ. Recompute the v2 detection rates with the matching slice's empirical DEFF* applied to each protocol's row. If early-week sponsored polls have even higher noise floor → tail-concentration excess detection may concentrate in late-week polls; if uniform → tail concentration is mechanism-stable. ~1 hour. Suggested file:
an-NNN-an109-v3-slice-conditional-deff.py. (lead from AN-109 v2)AN-NNN: which sponsored protocols catch the calibrated blind audit? AN-109 v2 catches 12.3 % of sponsored protocols at Bonferroni — ~14 protocols out of 114. Characterize them: which firms, routes (CPF / committee / party / party-name), what poll- percent shift, how do they compare to the 87 % of sponsored protocols that don't catch? If concentrated in one route or one set of firms, that's a stronger Channel-B signal localized to specific actors. ~1.5 hours. Suggested file:
an-NNN-which-sponsored-protocols-fail.py. (lead from AN-109 v2)§sec:policy — calibrated blind audit as policy proposal. AN-109 v2 establishes that a calibrated blind audit using empirical DEFF* + Bonferroni threshold + protocol-level "any share extreme" rollup yields a 2.6× lift on sponsored vs independent polls. Write a §sec:policy paragraph documenting the instrument as a proposed TSE / regulatory tool: (i) pre-compute empirical DEFF* from the registered-polls universe; (ii) for each new poll, test every candidate's reported share against same-candidate × race-week LOO benchmark with empirical-DEFF- inflated SE; (iii) flag protocols where any |z| > Bonferroni threshold for the race's k. Lift, FP rate, and operational feasibility documented. Writing item. (lead from AN-109 v2)
Leads from an-121-iceberg-universe.py — 2026-06-21
Update paper v2 intro ¶3 with the universe number (writing, paper-load-bearing). Replace the "at least 668 polls" floor from AN-094's top-25 audit with AN-121's universe number: "Of the 14,887 registered 2024 mayoral polls, 12.4% are routed through cover vehicles — shell CNPJs (7.7%) and MEI-individual entities (4.7%) — whose connection to a candidate cannot be established administratively. The cover-vehicle share grew from 3.8% in 2020 to 12.4% in 2024, a 3.3× increase; the IPOP self-to-shell pattern repeats across 15 distinct pollster firms covering 508 polls in 2024. A further 2.8% (vs 6.4% in 2020) is uncoded — low-volume sponsors the classifier cannot place — many of which are likely sub-threshold cover vehicles (see Appendix Table iceberg)." Cite AN-121. (lead from AN-121)
Identify top-5 IPOP-pattern pollsters by name (blind spot, paper-load-bearing). The 15 pollster_self → shell/mei transitions are anonymous by pollster_cnpj. Top two (36348794: 357 → 68 polls; 37658984: 230 → 219 polls) likely include IPOP itself + 1-2 other major operators worth naming in the paper. Cross-reference with
pipelines/cnpj/build/clean/to recover razão social. ~15 min. Suggested file:_an121-ipop-pattern-firms.py. (lead from AN-121)Hand-audit the 89-CNPJ rule-extended shell list (extension, ~3-4 hr). AN-121 rule recovers 89 shell CNPJs beyond AN-094's hand-audited 25. Same audit protocol: razão social + capital social + CNAE + cross-state spread + web presence on each of the 64 new CNPJs. Quantifies precision on the full list; surfaces obvious false-positives. Defer until paper v2 §2 redraft. (lead from AN-121)
Cross-validate against AN-102 shell bucket on analysis sample— done AN-122 2026-06-21. Shell-β stays null under 2.6× sample expansion.
Leads from an-122-shell-bucket-expanded.py — 2026-06-21
Re-run AN-102's GBM-predicted-bias spec with AN-121 shell list (extension, high gain, ~30 min). AN-102's interesting finding was that other_firm-residual lit up the GBM-predicted-bias outcome (+0.024 raw, p<0.01) while shell did not. With the AN-121 expansion the residual shrinks from 2,542 → 1,601 rows. Re-running the GBM-predicted-bias spec tells us whether the predicted-bias signal moved from residual into the expanded shell bucket (= AN-121 rule caught the slant signal) or stays in residual (= GBM signal is not driven by volume). Suggested:
an-NNN-shell-bucket-expanded-gbm.py. (lead from AN-122)Add a sentence to the iceberg appendix (writing, ~5 min). The construction-Shell paragraph should close the loop with AN-122: "The expanded shell bucket does not produce polls measurably noisier than media polls on the |error| outcome at the per-cand-poll-row level (AN-122; null across every FE spec at n=1,612), consistent with the iceberg-framing claim that cover-vehicle activity does its work at the identification margin rather than the per-poll noise-floor margin." (lead from AN-122)
Power analysis: what sponsor-row effect would the |error| spec detect on shell polls? (blind spot, ~30 min). Bound the null. If the S3 CI on shell-β at n=1,612 rules out a per-sponsor- row effect of size X, that's an upper bound on the shell-bucket mechanism magnitude under the within-poll-tilt model. Useful for the paper's caveats section. (lead from AN-122)
Leads from an-120-funding-source-heterogeneity.py — 2026-06-21
Add "funding source declared" to §sec:roadmap design-inventory table (extension, paper-facing). AN-120 surfaces a sharp disclosure-quality signal: #NULO# (undeclared funding) protocols show β = +10.4 / +13.8 pp vs the +6-7 baseline (p < 10⁻⁵). Parallels the AN-024/AN-021/AN-022 selective-disclosure pattern already in Table 5 (coverage / audit / methodology completeness). Cell shape: "Funding source declared at registration" — share #NULO# on sponsored vs independent. One-line addition to AN-120's script as
funding_disclosure_by_sponsor(); new row in the paper.tex design-inventory table. ~1 hr. (lead from AN-120)Cross-validate funding signal on pagante row. (blind spot) Contratante is one of two per-protocol funding declarations; pagante may carry different info, especially for Recursos Próprios where the candidate personally reimburses the committee. A pagante-level rerun would test whether the contratante reading is artefactual, and might rescue the untestable Recursos Próprios hypothesis. Suggested script:
an-NNN-funding-source-pagante.py. ~30 min. (lead from AN-120)#NULO# funding within-firm pattern. (puzzle) AN-120's +13.8 pp on committee × #NULO# (n=16) — is this concentrated in specific firms (consistent with AN-018 volume-discipline gradient) or spread across many? If concentrated in low-volume firms, funding-disclosure is collinear with the volume axis and adds no new mechanism. If spread, the funding-disclosure axis is a separable mechanism worth its own paragraph. ~30 min. Suggested file:
an-NNN-nulo-funding-by-firm.py. (lead from AN-120)Doações Eleitorais low-β puzzle. (puzzle, low priority) Point estimate +3.0 (Spec A) / +5.8 (committee cell) is the lowest among declared categories but underpowered (n=50/39, CI overlaps Fundo Partidário). If the cell expands to 2× in the 2022 cycle, a "donor-funded polls are less biased" reading would emerge as a distinct mechanism (donor-side reputational stake) parallel to pollster-side reputation. Park until cycle extension ships. (lead from AN-120)
Self-review punch lists from 2026-06-23 — RESOLVED
All three self-review punch lists (35-annotation pass on paper,
27-annotation pass §3.2–§6, 17-annotation pass §6.3–§8) closed
via the 2026-06-23 walk-through. Response logs preserved at
docs/responses_hsigstad_2026-06-23.md,
docs/responses_hsigstad_2026-06-23-pass2.md, and
docs/responses_hsigstad_2026-06-23-pass3.md. All corresponding
hypothes.is annotations deleted from the gh-pages site.
Appendix B restructure (from 2026-06-24 self-review #2) — RESOLVED
- Restructure Appendix B (Selection patterns) — landed
2026-06-24. New appendix is two paragraphs around (a) AN-127
candidate-level LPM of \emph{ever self-sponsored} on quadratic
vote share + log donations + rank, race FE, cluster-robust SE
(table at
build/table/an-127-selection-by-candidate.tex); and (b) AN-128 timing-density figure (build/figure/timing_density_by_sponsor.pdf). Headline: within a race, a 10-pp higher final vote share is associated with about 1.5 pp higher commissioning probability (s.e. 0.5); self-sponsored polls' median days-to-election = 13 vs 23 for independents (Mann–Whitney p < 10⁻¹⁷). Quadratic in vote share null within race; donations no measurable predictive content. Candidate-level expenses not in the regression — TSE anonymized NR_CPF_CANDIDATO in 2024 despesa_paga (all values = −4 sentinel). §6 selection paragraph updated with concrete numbers.