Done
Completed tasks, moved from todo.md.
2026-06-21 — paper AJPS pre-submission revision
- Paper structural rewrite — empirical-question framing ("are polls captured?"), §1 Intro → §2 Setting → §3 Data & ID → §4 Capture estimate → §5 Selection (standalone) → §6 How capture is produced (Not declared design / Not statistician / Unobserved production stage) → §7 How capture can be prevented (collapsed accountability levers + disclosure design tiers) → §8 Conclusion. Title locked as "Biased Electoral Polls" (dropped "Evidence from Brazil" subtitle).
- Editorial passes — mechanical style check (banned phrases
muni,spec ladder,load-bearingpurged repo-wide; rules added toresearch/rules/writing_style.md), LLM style review on main text + appendix, AJPS structural-fit check (§7 4→2 subsections; §5 standalone confirmed), vocabulary fit check, table/figure styling pass (\textit{Note:} blocks in minipage). - Bibliography enrichment — CrossRef API filled vol/no/pages
for 8 entries; author names standardized to
Last, First; bibstyle switched aer → apsr (AJPS-conformant Chicago author- date). - Author note + Data Availability Statement — added with placeholders for acknowledgments, funding, IRB, Dataverse DOI.
- Substantive referee review (external agent, 25 findings) — Option B applied (cheap + medium fixes); LLM hand-validation deferred to AN-130.
- Narrative-flow review (external agent, 20 findings) — 17 of 20 applied: intro contribution sentence expanded + selection paragraph added + cross-domain parenthetical moved to §7.2; §2→§3, §4→§5, §5→§6, §6→§7, §6.1→§6.2 one-sentence bridges; "sponsor-conditional bias" anchored at §4.1; "sponsor capture" reused in §4 and §8; "unobserved production stage" settled as canonical; ART/CONFEA picked up in §7.1 statistician paragraph; §7 opener "disciplining" → "preventing" (title match); conclusion opens with empirical summary. Two findings consciously skipped: #3 (Channel A executed/declared — judged useful nuance), #16 (§7.2 internal pacing — would require minor reorder).
- Defensive content trim — §4.1 LOO footnote and §4 d2e inference sentence cut as R&R material.
- Paper at 37 pages, clean compile, 0 undefined refs / cites.
- completed: 2026-06-21
2026-06-21 — AN-123 — Spec 3c diagnostics
- AN-123 (
source/analysis/an-123-spec3c-diagnostics.py) — referee-driven LOO ranges around the tightest spec (Spec 3c, β = +8.05, SE = 1.51, p < 0.001, n=204, 46 cells, 40 munis, 47 self-sponsored). LOO-cell range [+7.65, +8.43]; LOO-muni range [+7.65, +9.03]. Comparator-cell characterization: median d2e=7 days, race_margin=0.068, rank distribution 43% rank-1 / 34% rank-2 / 19% rank-3; 47 of 450 self-sponsored polls (10.4%) enter a clean-comparator cell, 31.6% in races with any independent poll. Closes the "Spec 3c estimate driven by a few cells/munis" referee concern.- completed: 2026-06-21
2026-06-21 — AN-124 — Days-to-election distribution
- AN-124 (
source/analysis/an-124-days-to-election-distribution.py) — referee-driven d2e distribution by sponsor bucket. Self- sponsored median d2e=9 days vs independent median=16 days, KS p<0.001 (significant difference in fielding timing). β essentially unchanged when d2e is controlled: +6.97 (no d2e control) → +7.02 (linear) → +6.89 (linear + quadratic). The comparator-timing concern does not move the headline.- completed: 2026-06-21
2026-06-21 — AN-122 — AN-102 with expanded shell bucket
- AN-122 (
source/analysis/an-122-shell-bucket-expanded.py) — refit AN-102's |error| spec ladder with the AN-121 universe-extended shell list (89 CNPJs / 1,612 cand-poll rows) vs AN-094's hand-coded list (14 / 620). Shell-β stays null under 2.6× sample expansion: S0 +0.03 (0.34), S1 −0.39 (0.36), S2 −0.29 (0.31), S3 −0.46 (0.31), every p > 0.14. SE tightens ~40% in S3 (0.57 → 0.31) without pulling point estimates away from zero. AN-102's "professional cover-vehicle polls look like media polls" interpretation is robust. The script reproduces AN-102's coefficients exactly on the AN-094 list (sanity check passes). Implication: the cover-vehicle category does its work at the identification margin (we can't link to a candidate), not at a per-poll noise-floor margin — supporting the iceberg- framing claim. Yellow confidence: |error| is structurally underpowered for the per-sponsor-row mechanism (the iceberg category itself blocks the per-sponsor-row test by construction); the AN-102 GBM-predicted-bias outcome would be more sensitive but is not rerun here.- completed: 2026-06-21
2026-06-21 — iceberg appendix section in paper
- Write Appendix sec:appendix-iceberg in paper/paper.tex —
brief intro paragraph + the
iceberg_universe.textable (\inputfrombuild/table/) + four construction paragraphs covering (i) the 4-way classifier (CNAE list, capital floor, shell-name veto, calibration via AN-094 hand audit), (ii) the three other_firm sub-buckets (Shell rule with 69% / 91% calibration, MEI porte=01 flag, Uncoded residual with three-way -mix interpretation), (iii) protocol-level aggregation priority ordering, (iv) replication pointers to source/assemble/poll.py, source/analysis/an-121-iceberg-universe.py, shell-cnpj-list.csv, calibration.csv. Paper compiles to 36 pages clean. (Ad-hoc request, not tracked as an explicit AN-121 follow-up lead.)- completed: 2026-06-21
2026-06-21 — iceberg appendix table builder
source/table/iceberg_universe.py— paper-input LaTeX appendix table for the AN-121 sponsor-bucket breakdown. 6 rows (Candidate-linked Routes A–D / Media outlet / Pollster self-contracted / Shell CNPJ / MEI-individual / Other firm non-shell) × {2020 n, 2020 share, 2024 n, 2024 share}, grouped by Sponsor-recoverable vs Cover-vehicle with subtotals (Recoverable: 9,854 → 12,632 polls = 89.8% → 84.9%; Cover: 1,117 → 2,255 polls = 10.2% → 15.1%). Output atbuild/table/iceberg_universe.texready for\input{../build/table/iceberg_universe.tex}from paper §2 setting or appendix.- completed: 2026-06-21
2026-06-21 — AN-121 — Universe-level iceberg quantification
- AN-121 (
source/analysis/an-121-iceberg-universe.py) — universe-extension of AN-094's top-25 shell hand audit to the full 14,887 mayoral 2024 universe + 10,971 mayoral 2020 universe. Single intro number: 12.4% of 2024 mayoral polls are routed through cover vehicles (shell CNPJs 7.7% + MEI-individual 4.7%), vs 3.8% in 2020 — a 3.3× increase; uncoded residual (low-volume CNPJs the classifier can't place) shrank 6.4% → 2.8%, consistent with cover-vehicle activity consolidating into identifiable patterns. Universe-extended shell list: 89 CNPJs / 1,149 polls (vs AN-094 top-25 floor: 14 CNPJs / 668 polls). IPOP pattern at scale: 15 pollster firms switched from pollster_self in 2020 to shell or MEI-individual in 2024, covering 508 polls in 2024 (top two pollster_cnpjs alone account for 287 of those). Rule calibration vs AN-094: 69% recall on PROBABLE_SHELLs (4 promoted to media/pollster_other by CNAE upgrade), 91% precision on rule-flagged top-25 firms. The 15.1% / 89-CNPJ count is a precision-favoring floor. Load-bearing for paper v2 intro ¶3 (paper/drafts/intro_v2.texline 55-72) and §2 setting table. Yellow confidence: recall miss is documented, growth direction + IPOP-pattern are robust.- completed: 2026-06-21
2026-06-21 — AN-120 — Sponsor-type heterogeneity by DS_ORIGEM_RECURSO
- AN-120 (
source/analysis/an-120-funding-source-heterogeneity.py) — within-candidate β by declared funding source on the contratante row of each candidate-sponsored protocol (n=450 sponsored across 22,256-row panel). Original hypothesis refuted at the relevant unit: Fundo Partidário (party-fund-routed, n=239) β = +6.86 pp ≈ pooled baseline; Recursos Próprios is structurally untestable at the contratante level (n=3 — campaigns route candidate-personal money through the committee, not via the candidate's own CPF). New finding — disclosure-quality signal: #NULO# (undeclared funding) cells show meaningfully elevated β = +10.39 (Spec A, n=25) and +13.77 (committee × #NULO# cell, n=16), parallel to the AN-024/ AN-021/AN-022 selective-disclosure pattern. Within route=committee gradient: Doações Eleitorais (+5.8) < Fundo Partidário (+9.2) ≈ Outros (+8.3) < #NULO# (+13.8) — opposite the party-budget- discipline prediction. Yellow confidence: original Recursos Próprios test is structurally not askable; #NULO# elevation is robust across Spec A + Spec B but n=16-25 cells warrant the hedge.- completed: 2026-06-21
2026-06-21 — AN-119 — Negative-β firm diagnostic
- AN-119 (
source/analysis/an-119-negative-beta-firm-diagnostic.py) — diagnostic for CENSUS (β=−8.19) and EVA FRANCIELI (β=−8.44), the two firms that drive AN-071's strict-cut Pearson r=−0.64 and AN-073's negative party-HHI × β. Resolution: both firms' AN-016 candidate-FE β values are identified off microscopic within-firm samples — CENSUS: 2 candidates / 6 rows (reproducing β=−8.19 exactly as the naive within-candidate gap); EVA FRANCIELI: 1 candidate / 4 rows (André Luiz Rokoski). The selection-of-losing-sponsors hypothesis is rejected — the two firms select opposite sponsor types (CENSUS 88% rank-1 winners vs universe 64%; EVA FRANCIELI 70% rank-3+ losers vs universe 12%). Negative-β tail in the §within-firm forest plot is a spec-thinness artefact, not a real reputation signal. Volume-discipline narrative (AN-018) is unaffected because it averages many firms per tertile.- completed: 2026-06-21
2026-06-21 — Silent closures: 3 additional leads done by later AN work
Items the audit flagged as silently closed by subsequent analyses but never marked [x] in todo.md.
AN-016 lead — Heterogeneity by firm size (formal test β on log(n_polls_total)) — closed by AN-018. AN-018 implemented exactly this regression: δ = −4.28 unweighted (p=0.017, R²=0.18); −5.74 WLS (p=0.0005, R²=0.35); monotone tertile split +12 / +9 / −1. The "reportable size-as-discipline finding" the lead anticipated is now the lead-in to paper §sec:within-firm.
- closed by: AN-018 (2026-06-02)
AN-040 lead — sponsor-bias headline decomposed by final_rank × rank-at-commission × |race_margin| — closed by AN-045 + AN-050. AN-045 implemented the 3-way decomposition with final_rank; AN-050 re-ran with rank_at_commission. Results are cited in paper.tex §sec:rank-heterogeneity (line 864) as $\DescRankHetRunnerTightGap$ / $\DescRankHetLeaderTightGap$ / $\DescRankHetLeaderWideGap$ pp.
AN-045 lead — repeat AN-045 with rank_at_commission instead of final_rank — closed by AN-050 (an-050-bias-by-rank-at-commission-margin.md). Test of "bandwagon at leader-in-tight" vs "coordination at runner-up-in-tight" under rank_at_commission framing is the exact subject of AN-050.
- closed by: AN-050 (~2026-06-02)
2026-06-21 — Inline [x] sweep: migrate 27 completed-in-place items
Items that were marked [x] inline in todo.md (with done notes) but never moved to done.md. Collected here so todo.md reflects only open work. Each entry's substantive note was preserved at the original todo.md location until this migration; see git history for prior wording.
Self-review punch list (2026-06-21 annotations)
- #5. Drop "party audits" mention? — Resolved by current state: §6 retains "obstruction of party audits" in the full sanctions list; the intro dropped it. (2026-06-21)
- #14. State which "phenomenon". — §2 paragraph opening rewritten: "Sponsor bias in registered polls is recognized and contested in Brazilian electoral politics." (2026-06-21)
- #15. Re-title §2 "Brazilian context." — Retitled to "Public recognition and prior enforcement." (2026-06-21)
- #29. §6 "pesquisa fraudulenta" paragraph — sharpen claim. — Rewrote paragraph at paper.tex line ~1371. Body ties statutory reading to the empirical finding; Portuguese terms moved to footnote citation. (2026-06-21)
- #16. Route A→D listing in §2 — move to appendix? — Moved.
§2 "Sponsor identification" paragraph now condenses the four
routes to one sentence with a forward-reference to Appendix
sec:appendix-routes; the new appendix carries the full per-route definitions and counts. (2026-06-21) - #18. Table 1: transpose with checkmark FE indicators. —
Rewrote
source/table/note_table.pyto emit the transposed layout: rows are β_self, β_opp, SE, six FE / control checkmark rows, and N; columns are Naive / (0) / (1) / (2) / (3a) / (3b) / (3c). Regeneratedbuild/table/note_main.tex. (2026-06-21) - #22. Scatter: share political-clients × per-firm β. —
AN-017 already produced
build/figure/customer_mix_refresh.pdf(per-firm β vs candidate-sponsored share). Wired into §within-firm asfig:customer-mix; no new analysis needed. (2026-06-21) - #24. Scatter: per-firm β × log(firm scale). — AN-018
already produced
build/figure/firm_size_discipline.pdf(per-firm β vs log-volume with OLS, n_self-WLS, and DerSimonian--Laird random-effects regression lines). Wired into §within-firm asfig:firm-size. (2026-06-21) - #26. Reframe upstream media filter as "media reports more
unbiased polls". — New script
source/analysis/an-118-media-coverage-vs-beta.pyjoinsmedia_amplification.csvtocustomer_mix_refresh.csvon firm and regresses an any-media-hit indicator on within-firm β with race FE. Within race, each pp of β associates with a −0.98 pp lower probability of media coverage (p = 0.006, n = 214 cells across 22 firms); pooled OLS slope is −1.42 pp (p < 0.001). §within-firm paragraph rewritten to lead with the direct-test framing instead of the volume-as-proxy version. Artifacts:build/table/an-118-media-vs-beta.csv,build/figure/an-118-media-vs-beta.pdf. (2026-06-21) - #27. Redesign Table 2 (design-inventory). — Restructured to "Design choice | Sponsored | Indep. | p". Top block has six rows where the per-poll mean comparison is the natural object; bottom block lists four rows whose natural test is not a mean-contrast (within-pair signed difference, universe χ², KS, χ² across strata) with the relevant statistic in the middle columns. Verdict column dropped; reader infers direction from sign + body text. A future pass could redo the top block with FWL race-FE-residualized means. (2026-06-21)
Source-of-bias probe agenda
- Non-response / undecided handling × sponsor quantitative contrast
— done AN-043,
2026-06-02. Null-by-data-design: structured field 100%
not_specifiedon all 488 pair-sides; 5+2 / 488 free-text grep hits. Escalated to relatório-PDF extraction pipeline (still open). - Mode × sponsor quantitative contrast — done AN-041, 2026-06-02. Mode-substitution refuted: 0% phone on sponsored vs 10% on independent; χ² p=0.0003 in the opposite direction.
- Question / name-order priming and within-poll scenario selection — AN-051, 2026-06-02. Sponsored polls dramatically under-document rotation (5.4% vs 26.1%, McNemar p≈4e-8); cherry-pick hypothesis refuted (sponsored file FEWER scenarios). Carrier test underpowered.
- Interviewer training / supervision detail × sponsor — AN-042, 2026-06-02. Sponsored describe training MORE (84.4% vs 72.5%, McNemar p=0.002), opposite of opacity prediction. Reframed as selective disclosure.
- Document the size-mismatch problem in the paper — done 2026-06-02. Inserted "Size mismatch and ruled-out levers" + "Selective disclosure" paragraphs into paper.tex §5.
Empirical pipeline + scraping
- Spot-check TSE individual-protocol pages — done 2026-06-02.
sig.tse.jus.br blocked by Trustwave JS-token wall; recovery via CKAN
pesquisas-eleitorais-2024package (3 unused resources: bairro_municipio, questionario_pesquisa, nota_fiscal).
Robustness ledger leads
- AN-010 K2 puzzle: why does within-protocol renormalization inflate sponsored polls more than independent ones? — closed by AN-014. 98.6% of K2 renormalized-vs-raw gap is mechanical multiplicative scaling on a non-zero raw effect.
- Wild-cluster bootstrap SE on spec 3c — closed by AN-012 on the race-week-FE-only spec (WCR p=0.0175, WCU 95% CI [+1.09, +8.29] at G=51 muni clusters, B=2000). TWFE bootstrap with candidate FE parked as AN-012 lead 1.
- Refresh AN-007's customer-mix sorting on the larger per-firm β table — closed by AN-017. Holds in sign (γ unweighted = +7.6, WLS = +4.3) but does NOT sharpen with 3× sample (p ≈ 0.27–0.54, R² < 0.05).
- AN-018: firm-size discipline test — closed by AN-018. Firm size dominates customer mix. Univariate δ on log(n_total): −4.28 unweighted (p=0.017, R²=0.18); −5.74 WLS (p=0.0005, R²=0.35). Tertile split: small +11.98, medium +8.64, large −0.93.
Quick-win battery (AN-041 leads)
- AN-042 — interviewer training × sponsor (quick win 2/3) — done 2026-06-02. McNemar p=0.002, OPPOSITE direction. See AN-042.
- AN-043 — non-response handling × sponsor (quick win 3/3) — done 2026-06-02. Null-by-data-design. Escalated to relatório extraction. See AN-043.
Rotation finding follow-ups (AN-051 leads)
- Pollster-FE within-firm check on the rotation contrast — AN-052, 2026-06-02. AN-051 marginal contrast vanishes within firm; AN-051's contrast is a between-firm composition effect.
- Direct candidate-position test — AN-053, 2026-06-02. Refutes priming-exploitation reading; sponsored polls list sponsor's candidate LATER not earlier.
- Loop-closer: AN-054 rotation rate × AN-018 firm-size tertile — AN-054, 2026-06-02. Monotonic 5× gradient matches AN-018 discipline curve; AN-051 fully absorbed into firm-size discipline narrative.
CPF dyad mechanism chain (AN-073/AN-074/AN-075/AN-076/AN-077)
- Sponsor-route × bias interaction (B vs C/D) — covered by AN-006 (2026-06-02, β = +19.12 pp on CPF vs +6-9 pp on other routes, within firm). AN-073 clarified why it's the relevant test for the M1-individual reading.
- AN-076: no-institute-FE robustness for the AN-075 split — done 2026-06-16. βs barely move (repeat +19.11 → +18.34; singleton +16.78 → +16.11). Institute FE was NOT absorbing M1-individual firm-choice mechanism; candidate FE carries the identification.
- AN-NNN: All-routes repeat-vs-singleton β decomposition — done 2026-06-16 as AN-077. Committee route shows β_repeat − β_singleton = +7.43 pp gap (n_r=57, n_s=302, both p<0.001); CPF, party, party-name show no gap. Reading: M5 / contract-structure dynamics for committee gap.
- AN-NNN: Publication rate × repeat-vs-singleton on committee route (M1 vs M5 discriminator) — done 2026-06-16 as AN-078. Committee_repeat disclosure 63.3% vs committee_singleton 71.7% (gap −8.4 pp, χ² p=0.12 — directional M5 but short of significance).
Empirical noise floor + headline robustness (AN-108–AN-111)
- Empirical DEFF by race × week × candidate — AN-110, 2026-06-19. Pooled empirical DEFF* = 12.59. Explains AN-109 miscalibration; qualifies AN-108's per-poll-loud reading.
- Per-poll z-statistic vs comparator — blind multi-candidate audit-ability — AN-109, 2026-06-19. Blind binomial-z audit not calibrated (FP 30–60%); sponsored polls show stable +13 to +19 pp excess detection.
- Headline-regression robustness under empirical noise floor — AN-111, 2026-06-19. Spec 2 β=+6.86 pp robust under cluster_muni/race_week/ politico_id/twoway/wcr-bootstrap (t≥6.0 every spec); Spec 3c β=+7.91 pp robust (t≥3.0 analytic, t=6.1 bootstrap).
- AN-109 v2 with empirical DEFF* — done 2026-06-19 as in-place update of [AN-109]. At empirical DEFF*: blind audit calibrated (independent FP 4.8% Bonferroni); sponsored excess detection persists at +7.5 pp Bonferroni / +11.9 pp single-hyp — a 2.6× lift. Mildly Channel-B-leaning.
2026-06-21 — TODO/paper alignment audit + two small writeup edits
Stale TODO entries — most already integrated into paper.tex / theory.md in the intervening weeks. Audited, marked done, and migrated here. Two genuine edits made today and recorded below.
Reference AN-015 in the paper note's data-quality footnote
- notes: added a sentence to the "Outcome construction via LLM
extraction" caveat in
paper/paper.tex(§sec:caveats) — three sponsor-balanced per-cell extraction-quality proxies (match score, count of named candidates, denominator distance) leave Spec 2 β unchanged; the differing proxy (mean denominator) runs opposite to a "sponsored polls extract fewer candidates" artifact. Hand-validation gold-standard pass remains queued under §sec:caveats. - completed: 2026-06-21
- notes: added a sentence to the "Outcome construction via LLM
extraction" caveat in
Refresh
docs/briefs/pollster_customer_mix.mdwith AN-017 / AN-018- notes: brief was first-pass dated 2026-06-02, reporting only the
11-institute AN-007 slope. Added a top-of-doc update block citing
AN-017's 31-firm refresh (γ on candidate-share still positive but
underpowered, R² < 0.05) and AN-018's volume-dominates finding
(log n_total δ = −5.74 WLS, R² = 0.35, monotone tertile split
+12 / +9 / −1). Original first-pass test preserved below the
update block. Reproduce block extended with AN-017 / AN-018
scripts. Frontmatter status flipped to
superseded. - completed: 2026-06-21
- notes: brief was first-pass dated 2026-06-02, reporting only the
11-institute AN-007 slope. Added a top-of-doc update block citing
AN-017's 31-firm refresh (γ on candidate-share still positive but
underpowered, R² < 0.05) and AN-018's volume-dominates finding
(log n_total δ = −5.74 WLS, R² = 0.35, monotone tertile split
+12 / +9 / −1). Original first-pass test preserved below the
update block. Reproduce block extended with AN-017 / AN-018
scripts. Frontmatter status flipped to
Cite AN-013 in the paper note's mechanism section
- notes: superseded — AN-013 grew from a footnote-sized cite into a full subsection §sec:digit (paper.tex lines 819-867) with A-vs-B digit comparison, round-number and tenths-digit tests, and an explicit blind-spots paragraph.
- completed: 2026-06-02 (cleanup recorded 2026-06-21)
Add the industry-segmentation framing to the paper note (AN-016)
- notes: integrated into §sec:within-firm at paper.tex 713-748 — "major established firms (CENSUS, IIP, Instituto Paraná, Verita) sit near zero" / "firms generating the largest individual coefficients are smaller, less-established pollsters" / "the headline +7-8 pp average is concentrated in low-volume firms; the big-name Brazilian polling industry self-disciplines."
- completed: ~2026-06-02 (cleanup recorded 2026-06-21)
Note AN-016 in the data-quality footnote of the paper note
- notes: paper.tex line 1622 cites the 46-pp cross-firm β spread as the within-firm rebuttal to uniform extraction bias.
- completed: ~2026-06-02 (cleanup recorded 2026-06-21)
Cite AN-017 in theory.md § "Pollster reputation"
- notes: cited at theory.md lines 660-676 inside the rewritten "Pollster reputation: volume vs customer mix" section. γ values and R² < 0.05 finding recorded.
- completed: ~2026-06-02 (cleanup recorded 2026-06-21)
Rewrite
docs/theory.md§ "Pollster reputation and customer-mix sorting"Soften the AN-016 industry-segmentation framing pending AN-018
Reorganize paper-note Section 5/6 around reputation-by-volume (AN-018)
- notes: paper §sec:within-firm at lines 688-787 leads with the monotone tertile split, the log(n_total) WLS slope with random- effects sensitivity, and the within-race media-amplification test — the post-AN-018 reading. Done in the natural reorganization of the within-firm section.
- completed: ~2026-06-02 (cleanup recorded 2026-06-21)
Promote AN-051 (questionnaire rotation) finding into paper §5
- notes: integrated into Table 5 (tab:design-inventory) at paper.tex line 1240 — "Scenario-rotation documentation (questionnaire PDF): 5.4% sponsored vs 26.1% independent (McNemar p ≈ 4 × 10⁻⁸); direct priming test runs opposite (20.4% vs 30.1% list the sponsor's candidate first, p=0.001); verdict: Documentation axis, mechanism not priming." Selective-disclosure framing follows at line 1265. Stronger than the original ~10-line ask.
- completed: ~2026-06-02 (cleanup recorded 2026-06-21)
Setup
- Promote idea to project
- notes: scaffolded
projects/poll-sponsor-bias/fromresearch/ideas/poll-sponsor-bias/. Brought oversummary.md,literature.md,references.bib, two analysis briefs (sp_slice_analysis.md,all_brazil_analysis.md), bulk-extraction audit, source/, build/. Status flipped fromideatoprojectin the summary frontmatter. - completed: 2026-06-02
- notes: scaffolded
Empirical pipeline
Build all-Brazil sponsor→candidate join (Routes A+B+C+D)
- notes:
pipelines/politica/source/clean/poll_sponsor_2024_join.py. A=CPF (18 candidate-poll matches), B=committee CNPJ name parse (364), C=party CNPJ via despesa_partidaria (38), D=party name parse (148). Total 568 candidate-poll rows withsponsored_by=1across 793 polls. - completed: 2026-06-01
- notes:
All-Brazil LLM extraction
- notes: bulk laptop run finished 2026-06-02 04:59. 174,747 candidate-
scenario rows from 9,509 protocols across 26 UFs. See
docs/briefs/bulk_extraction_audit.mdfor per-state coverage. - completed: 2026-06-02
- notes: bulk laptop run finished 2026-06-02 04:59. 174,747 candidate-
scenario rows from 9,509 protocols across 26 UFs. See
Pre-matched politico_id in poll_response_2024.parquet
- notes:
nome_urnapatch inpipelines/politica/source/clean/lifted match rate to 88% on non-aggregate estimulado rows (from ~64% before the patch). Eliminates the per-project candmatch step. - completed: 2026-06-02
- notes:
Analysis
All-Brazil headline regression
- notes: spec 3c (clean comparator + race × week FE) gives
β = +6.95 (p = 0.008) on 60 cells / 409 rows. Spec 1/2
(within-candidate FE) gives β = +7.6 to +7.8 (p < 0.001) on 30,555
rows. Pre-poll trajectory placebo (n=132): within-candidate jump
+6.70 pp (t = 5.21) on median 10-day gap. Three independent specs
converge on β ≈ +6 to +7 pp. See
docs/briefs/all_brazil_analysis.md. - completed: 2026-06-02
- notes: spec 3c (clean comparator + race × week FE) gives
β = +6.95 (p = 0.008) on 60 cells / 409 rows. Spec 1/2
(within-candidate FE) gives β = +7.6 to +7.8 (p < 0.001) on 30,555
rows. Pre-poll trajectory placebo (n=132): within-candidate jump
+6.70 pp (t = 5.21) on median 10-day gap. Three independent specs
converge on β ≈ +6 to +7 pp. See
Identification fix: contemporaneous independent comparator
- notes: addresses the "candidate commissions when leading" concern.
Sponsor-type classifier in
source/analysis/analysis_table.pyflags polls sponsored exclusively by independent media or pollster-self. The clean-comparator restriction + race × week FE makes the jump from SP-only +15.7 (3 cells) to all-Brazil +6.95 (60 cells). The independent-poll baseline is essentially unbiased at all-Brazil scale (+0.93 pp mean error), so the within-candidate jump of +6.70 is a clean estimate of self-sponsored deviation from unbiased polls of the SAME candidate. - completed: 2026-06-02
- notes: addresses the "candidate commissions when leading" concern.
Sponsor-type classifier in
AN-009 (
source/analysis/an-009-party-interaction.py) — party × sponsor interaction joint Wald test: χ²(5) = 2.31, p = 0.80, fails to reject pooled β. The apparent MDB null (β=+0.02) from the subset regression ofrobustness.md§3 was a reduced-power artifact; interaction-spec MDB β is +6.50 with a CI that overlaps the pooled +7.92. Headline +7.98 is a sufficient cross-party summary.- completed: 2026-06-02
AN-010 (
source/analysis/robustness_redteam.py) — red-team battery: four of five highest-leverage attacks cleared, fifth partially attenuates. K1 (media-only comparator): β=+7.59 (SE 4.30, n→253); K2 (raw poll percent, no within-protocol renormalization): β=+5.10 — renormalization contributes ~3 pp; K3 (Route B vice-prefeito): falsified upstream, 0/429 are VICE-PREFEITO; K4 (drop_absorbed audit): 97.3% of candidates absorbed, but race-FE-only refit β=+8.00 (SE 1.04) → FE selection is not generating β; K5 (drop Route D party-name regex): β=+9.30, names are real party organs. Headline survives; paper note should cite both renormalized +7.98 and raw +5.10.- completed: 2026-06-02
AN-011 (
source/analysis/an-011-permutation-jackknife.py) — three FE-stability tests, all pass. Permutation null (B=500): within-(race × week) shuffles ofsponsored_bygive a null distribution centered at −0.14 with sd 1.57; observed +4.68 (race-week FE only) is outside the null mass entirely — 0/500 trials at or beyond observed, p<0.002. Pollster jackknife (top-20 firms): β range [+7.80, +8.13], sd 0.074; Datafolha drop moves β by 0.003 pp. UF jackknife (26 states): β range [+7.42, +8.32], sd 0.180; PI is the only state whose drop moves β by more than 0.5 pp. The FE-structure critique of the headline is exhausted.- completed: 2026-06-02
AN-012 (
source/analysis/an-012-spec-se-robustness.py) — three SE-/spec-level stress tests, all pass. Wild-cluster bootstrap on race-week-FE-only spec (β=+4.68, G=51 muni clusters, B=2000): WCR two-sided p = 0.0175; WCU 95% percentile CI [+1.09, +8.29]. CRVE p on same spec = 0.014 → no under-coverage at this G; rejection extends to the +7.94 headline a fortiori. WLS spec-2 by sample_size: β = +8.15 (SE 1.42), within 0.18 pp of unweighted baseline +7.98. Week-window sensitivity (%U/%V/ 10-day / 14-day rolling): β range [+4.16, +5.31], all p<0.02 — spec not brittle to week-boundary definition. Inference-and-spec critique exhausted alongside AN-010/AN-011.- completed: 2026-06-02
AN-013 (
source/analysis/an-013-digit-frequency.py) — crude per-candidate post-fielding tampering ruled out; broader fabrication channels untested. Three groups: A=sponsor's own candidate (n=627), B=other candidate in same sponsored poll (n=1,182), C=independent poll (n=20,720). A vs B z-tests: integer rate p=0.79, mult-of-5 p=0.66; tenths-digit-0 share within values ≥ 5 identical at 20.4 % vs 20.3 %. A and B statistically indistinguishable on every digit metric. Group A's Benford first-digit shift toward 4-5-6 reflects sponsors polling viable candidates (30-60 % range), not numerical manipulation — Group B inside the same sponsored polls follows Benford. Power calc on the specific failure mode: 2 % crude tampering would be detectable. **Blind spots: sophisticated manipulation that preserves digit distributions, proportional within-poll rescaling, and any pre-publication data work (quota reweighting, dropping strata, re-running) leave no digit signature and are NOT addressed. Substantive read: narrows the mechanism space toward whole-poll sample-design slant; doesn't pin Channel A vs B down.**- completed: 2026-06-02
AN-014 (
source/analysis/an-014-denominator-audit.py) — K2 red-team conjecture falsified. Sponsors do NOT list fewer candidates: within-candidate denominator gap (n=226) median +0.15, 50% of candidates have positive gap. At the cell level, sponsored cells have larger denominators (91.74 vs 78.68; Welch p<10⁻¹⁵ in the opposite direction from the red-team conjecture). Decomposition: 98.6 % of the K2 renormalized-vs-raw gap is the mechanical multiplicative scaling (raw +6.08 × avg scale 1.198 = +7.29 predicted vs +7.40 observed; residual +0.11 pp). The renormalization step is not a Channel-A design lever — it's pure multiplicative amplification of the raw within-candidate effect. Both numbers (renormalized +7.94 and raw +5.10) reportable for the paper; the conversion factor is 1.198 ± noise.- completed: 2026-06-02
AN-015 (
source/analysis/an-015-data-quality.py) — data-quality concern dispatched. Three of four extraction-quality proxies (denom_dist, n_named, mean_match) are statistically indistinguishable between sponsored and independent cells; the fourth (denom) differs +13 pp in the OPPOSITE direction from the "LLM extracts fewer candidates in sponsored polls" hypothesis. Spec 2 + denom_dist + mean_match controls leaves β at +8.00. Adding n_named control drops β to +6.47 (plausibly cycle-stage / race-competitiveness, not extraction quality). Clean-denominator subset (denom ∈ [80, 110]): β = +5.39 — equal to AN-010 K2 raw +5.10, confirming AN-014's mechanical-scaling story (denom ≈ 100 → scale factor ≈ 1 → renormalization is null). Top-10 outlier audit shows balanced inflation/deflation (poll%=100 and poll%=0 cases, not systematic upward bias). Within-firm test was underpowered on top-5 firms — parked as the natural follow-up (refit on AN-007's customer-mix-sorting set with ≥ 5 sponsored polls per firm).- completed: 2026-06-02
AN-016 (
source/analysis/an-016-within-firm-beta.py) — within-firm β refit on the customer-mix-sorting set, the strongest available data-quality test. Within each firm, PDF style and LLM-extraction pattern are constant by construction; cross-firm β spread must be real sponsor-behavior heterogeneity, not artifact. Result: striking 46-pp dispersion. Mean +6.5 / +7.2 (≥10 / ≥5 cuts), median +4.4 / +6.3, range **[-11, +35], sd 10.3. 19 of 31 firms significant at p<0.05. Big-name firms have β near zero or negative within-firm (CENSUS -2.76, IIP -1.64, INSTITUTO PARANÁ -10.95, Verita +0.55, none significant), while smaller niche firms slant heavily (METHODUS +24.7, CAMARGO +23.6, INTENÇÃO +35.2, DATA SC +16.1, VISÃO +13.9, RADAR +11.6, all significant). Two implications: (i) data-quality concern definitively closed (uniform LLM bias cannot produce 46-pp cross-firm spread when each firm's PDF style is internally consistent); (ii) substantively new finding — the headline +7.85 pp is a cross-firm average masking sharp industry segmentation. Confirms and sharpens AN-007's customer-mix- sorting story.**- completed: 2026-06-02
AN-017 (
source/analysis/an-017-customer-mix-refresh.py) — refresh of AN-007's customer-mix slope on the AN-016 31-firm sample. Customer-mix sorting prediction holds in SIGN but not in significance: γ (candidate-share) = +7.6 unweighted (SE 6.8, p=0.27, R²=0.04) / +4.3 WLS (SE 7.1, p=0.54, R²=0.01). Down from AN-007's noisier +13.6 / +6.3. The pollster_self share has γ=−19.9 (p=0.10, marginal) — firms doing mostly marketing polls slant less for sponsors, the right direction. Substantive caveat: AN-016's striking 46-pp cross-firm β spread is NOT primarily explained by candidate-share (R²<0.05). The "big firms low β, small firms high β" pattern likely needs a firm-size / volume explanation that customer-share-alone misses. Parked as AN-018 follow-up.- completed: 2026-06-02
AN-018 (
source/analysis/an-018-firm-size-discipline.py) — **firm size is the load-bearing axis behind AN-016's β dispersion, not customer mix. Univariate β on log(n_total): δ=-4.28 unweighted (p=0.017, R²=0.18); δ=-5.74 WLS (p=0.0005, R²=0.35). Joint regression with share_candidate: size SURVIVES (δ=-7.09, p=0.0002 WLS), customer-share becomes insignificant (p=0.13). R² leaps from 0.04 (AN-017 customer-mix univariate) to 0.40 (size+mix WLS joint). Monotone tertile split: small firms (n_total~13) β=+11.98; medium (~41) β=+8.64; large (~118) β=-0.93. 9/12 small firms individually significant; only 2/9 large firms. Substantive: the +7.85 pp headline is concentrated in low-volume, low-reputation firms; the big-name Brazilian polling industry (CENSUS, IIP, Paraná, Verita, AR7, AGILI) appears to self-discipline at the within-candidate level. Replaces AN-007's customer-mix-sorting story with reputation-by-volume as the first-order mechanism.**- completed: 2026-06-02
AN-025 (
source/analysis/an-025-media-amplification.py) — within-race FE test of the media-filter mechanism on the runoff- eligible-muni panel (aptos ≥ 200k: 172 races, 889 race × firm cells). Three specs, ALL inside-race with race FE absorbing every race-level confound: log(1 + hits) δ = +0.053 (p = 0.09); cap-hit LPM δ = −0.010 (p = 0.46, perversely negative due to pollster-name query-noise artifact); **any-hit LPM δ = +0.019 (p = 0.045) — within race, doubling firm volume associates with +1.9 pp higher probability of any media hit, ≈ +10 pp across the panel's volume range. Descriptive tier breakdown: any-hit share 80 % for large-tier firms (174 cells) vs 60 % for small/medium. 32 % of cells right-censored at Google News RSS's 25-result cap; generic pollster-name tokens ("100%", "Igor", "Viva", "Data", "Olhar") cap-hit on unrelated content — explains the cap-hit perversity. Substantive: media filtering by firm volume is present at modest magnitude on the cleanest spec; combined with AN-018 it closes the reputation-by-volume feedback loop. Paper- clean magnitude estimate requires re-scraping at max_items=100 (parked follow-up).**- completed: 2026-06-02
Paper-macros system landed —
source/paper/build_numbers.pyreadsbuild/table/regressions.csv,build/table/party_interaction.csv,build/table/robustness.csv, andbuild/analysis_table.parquet; emits 44\Desc...\newcommands tobuild/paper/numbers.tex.paper/paper.texnow\inputs the macro file in the preamble and cites every load-bearing numeric (sample counts, route counts, all six spec coefficients, placebo statistics, drop-largest + match_score robustness, party-Wald summary) by macro rather than literal.source/paper/check_macros.pyverifies the contract;build.sh paperregenerates and checks before latexmk runs. Headline-number drift from a pipeline rerun now propagates automatically. Also resolved all /check ledger gaps: 4 deterministic cross-ref + artifact-index fixes applied.- completed: 2026-06-02
AN-026 (
source/analysis/an-026-rank-selection-and-bias.py) — coordination-incentive prediction tested on both selection and bias-gradient halves. Selection FAILS the prediction: rank-1 winners over-commission by +28 pp (self-share 63.1% vs baseline 34.7%; χ²(4) = 169, p < 10⁻³⁵), even more in safe races (+41 pp) than in tight ones (+21 pp). Bias-gradient PARTIALLY HOLDS under spec-3c: rank-1 β = +4.88 (p = 0.23, not significant), rank-2 +9.13 (p = 0.01), rank-3 +11.75 (p = 0.02), rank-4 +13.80 (p = 0.02), rank-5+ +5.24 (p = 0.26, n.s.). Compared to AN-004's spec-2 numbers, tightening to spec-3c pushes rank-1 down (+7.55 → +4.88, loses significance) and ranks 3-4 up (+7.8 → +11.8 / +13.8, gain significance). Substantive read: winners commission polls in greater numbers (resourced campaigns) but their commissioned polls show less slant — the marginal value of a slanted poll, not the marginal value of a poll, varies by rank. Companion-paper material; SSRN note unchanged.- completed: 2026-06-02
AN-027 (
source/analysis/an-027-rank-at-commission.py) — rank-at-commissioning follow-up to AN-026. AN-026's "winners over-commission by +28 pp" finding shrinks to +8 pp when rank is measured at commissioning (from the most recent prior neutral poll). Stratifying by race-margin tertile RECOVERS the coordination-story prediction in tight races: rank-1 self-share −9.7 pp (sensitivity −16.4 pp, p=0.007); rank-2 +19.2 pp (sensitivity +11.9 pp); rank-3 +14.4 pp. Wide-margin tertile still shows rank-1 over-commissioning by +22 pp. Two distinct mechanisms operating in parallel: coordination demand drives runner-up over-commissioning in competitive races; resourcing without strategic need drives leader over-commissioning in safe races. AN-026's pooled finding was the weighted average. Bias- gradient half is uninformative at current n (rank × tertile cells too thin). Companion-paper material.- completed: 2026-06-02
AN-028 (
source/analysis/an-028-registration-date-rank.py) — three-anchor robustness check for AN-027. Empirical surprise: 26.6% of polls are registered AFTER fielding ends, so date_registered is the regulatory-filing moment, not the commissioning moment per se; date_start is the institutionally cleaner anchor (commissioning must precede fielding). Tight-race coordination signal (rank-2 over-commissioning +17 to +19 pp primary, +8 to +12 sensitivity) ROBUST across all three anchors. Pooled "leaders over-commission" finding (+8 to +12 pp) also robust. NOTABLE SHIFT in the bias gradient: under tighter temporal anchors, rank-1 β jumps from AN-027's +6.51 (p=0.08, date_end) to +7.46 (p=0.013, date_start) and +7.84 (p=0.013, date_registered) — crosses significance. The AN-027 framing "winners' polls show no significant slant" softens to "leaders slant less than mid-ranks (+7.5 vs +12.2) but still significantly." Companion-paper material.- completed: 2026-06-02
AN-029 (
source/analysis/an-029-money-controlled-selection.py) — money does NOT drive the rank gradient. NULL on the AN-027 resourcing interpretation. Adding log(receita_2024) to the AN-026-style rank × margin selection regression leaves the rank coefficients essentially unchanged: baseline rank-2 vs rank-1 gap is −3.6 (tight) / −3.6 (mid) / −6.8 (wide); after money control −3.6 / −3.6 / −6.6. log(revenue) main effect +0.23 pp / log-unit (p=0.16 pooled); rank × log(revenue) interactions all <0.012 pp, all p>0.17. The safe-race rank-1 over-commissioning is real and robust to money control but NOT explained by campaign-finance revenue. The AN-027 "resourcing without strategic need" interpretation needs revision — alternative resourcing channels (campaign sophistication, party institutional support, fixed-cost threshold) remain candidates. Companion-paper material; a possible decisions.md entry on the framing change is queued in follow-ups.- completed: 2026-06-02
AN-030 (
source/analysis/an-030-rank-at-commission-money.py) — joint test of the two-mechanism story. Both halves survive money control; money is orthogonal to both. Tight-race rank-2 LPM coefficient (vs rank-1 ref): **+1.81 pp baseline → +1.81 pp with log(receita) control (p=0.10, n_self=32). Wide-race rank-2: −0.82 baseline → −0.73 with money (p=0.24). Tight-vs-wide rank-2 contrast = 2.6 pp, money-orthogonal. log(receita) coefficient ≤+0.14 pp / log-unit in race-FE specs, all insignificant. Notable shift from AN-029 final-rank: tight rank-2 flips from −3.61 (final-rank) to +1.81 (rank-at-commission), a 5.4 pp swing showing temporal alignment matters. Substantive read: coordination-demand mechanism in tight races AND non-money rank-1 dominance in safe races both operate; the AN-027 'resourcing without strategic need' label is confirmed-refuted. Companion-paper material.**- completed: 2026-06-02
AN-031 (
source/analysis/an-031-route-bootstrap.py) — wild-cluster bootstrap on the AN-006 sponsor-route split. CPF cell (n_self=18) survives finite-sample inference cleanly: WCR two-sided p = 0.009, WCU percentile 95% CI [+8.68, +29.47] vs CRVE [+7.62, +30.42] — bootstrap is modestly TIGHTER on the lower bound. Other routes also pass cleanly: committee [+6.16, +11.23] (p<0.001), party [+3.83, +11.51] (p=0.006), party-name [+2.03, +10.29] (p=0.004). β_obs for CPF = +19.02 (within solver tolerance of AN-006's stored +19.12). The §4.4 paper footnote was updated to use the bootstrap CI + p-value; the CRVE pair is now reported as a parenthetical.- completed: 2026-06-02
Documentation
Institutions doc anchored against shared brazil-institutions reference
- notes: done via commit
67d0b82(Henrik, host with the shared refartigos.db). Statute mentions converted to backtickLE.33-style citations (24/24 resolve via cite.py). Added 7-row table mapping LE.33 subsections → dataset columns / LLM extraction targets; subsections on compliance/sanctions (LE.33.§3-§4, LE.34, LE.35), mayoral race rules grounding 1-prefeito-per-party identification (CF.29.II, LE.3, LE.11), and CONRE statistician registration. Header points to four shared-ref topic files instead of duplicating doctrine.
- completed: 2026-06-02
- notes: done via commit
AN-019 (
source/analysis/an-019-coverage-class-by-sponsor-type.py) — coverage_class × sponsor_type on n=200 methodology subset. Slant-permissive coverage (specific_neighborhoods + urban_only): 12% candidate-touched vs 10% independent; opaque (deferred + not_specified): 72% vs 80%. Direction matches Channel A; n_candidate=25 makes the test underpowered. Decisive cross-tab deferred to the universe extraction.- completed: 2026-06-02
AN-020 (
source/analysis/an-020-coverage-class-by-sponsor-route.py) — coverage_class × sponsor route on n=200. Subset has 0 CPF polls. Committees (n=6) are 83% deferred-to-complement, never selective. Party-route polls (n=4 total) are 50% specific_neighborhoods. Suggestive: committees hide via complement document, party uses selective coverage. Red confidence on cell counts that small.- completed: 2026-06-02
AN-021 (
source/analysis/an-021-audit-pct-by-sponsor-type.py) — audit_pct CDF on n=200. KS p=1.00; ~76% of every bucket at the 20% legal floor. Right-tail gap qualitative not statistical: candidate-touched polls max at 30% audit, independent reach 100%. Channel B might operate AT the floor (legal-minimum-only firms), not via avoiding it.- completed: 2026-06-02
AN-022 (
source/analysis/an-022-methodology-completeness-index.py) — completeness index on n=200. Direction OPPOSITE Channel A: candidate-touched mean = 0.43 vs independent = 0.39, t = +1.25 (wrong-signed null). If sustained at universe scale, "candidates minimize disclosure" subprediction is dead and Channel B (residual / fabrication) must carry the load.- completed: 2026-06-02
AN-023 (
source/analysis/an-023-pollster-boilerplate-fingerprint.py) — pollster cov_bucket fingerprint, universe scale. 216 institutes with ≥20 polls. corr(candidate_share, substantive_share) = +0.058 — essentially zero. High-candidate-share firms (≥30%, n=39) are 49% substantive vs 44% for low-candidate-share (≤10%, n=134). Direction wrong-signed at firm level too. Confidence: green.- completed: 2026-06-02
AN-024 (
source/analysis/an-024-coverage-deferral-by-sponsor.py) — coverage deferral by sponsor, UNIVERSE SCALE. n=14,876. Candidate-touched defer at 35.5% vs independent at 38.3%. Odds ratio 0.89 (95% CI [0.80, 0.98]), chi-square p = 0.021. The "candidates hide coverage via deferral" subprediction is statistically refuted at universe scale. Closes the D1-D6 batch. Across every measured methodology lever (D1-D6), Channel A is null or wrong-signed; Channel B (residual / fabrication, or quota-distribution slant) must carry the +7 pp.- completed: 2026-06-02
AN-033 (
source/analysis/an-033-deferral-bias-interaction.py) — Spec B deferral × bias interaction. Within-candidate FE: γ onsponsored_by × deferredis +1.08pp (SE 2.08, p=0.60); +2.05pp (SE 2.18, p=0.35) with institute FE + controls; flips to −5.63 (SE 5.46, p=0.30) under race × week FE. Headlinesponsored_byreplicates AN-001 (+5.9 to +9.3 across specs). Deferral is ruled out as a Channel-A lever in BOTH directions — sponsors don't disproportionately defer (AN-024) AND deferral doesn't amplify bias within-candidate (AN-033). Residual concrete-lever space narrows to population frame, coverage class, census-setor usage. Confidence: green.- completed: 2026-06-02
AN-040 (
source/analysis/an-040-deferral-rank-heterogeneity.py) — deferral × rank heterogeneity follow-up to AN-033. 3-waysponsored × deferred × rank1is null across all four specs (+2.03 / +4.09 / +0.46 / −12.51, SEs 3.3–10.5pp, sign-inconsistent). Closes the deferral lever — neither selection (AN-024), pooled amplification (AN-033), nor rank-conditional amplification (AN-040). Split-sample companion reveals a sharp rank-1/rank-2/rank-3+ gradient onsponsored_byitself: +5.58pp (winners) / +11.52pp (close losers) / −2.81pp (out of contention). The pooled +7pp masks a 2× rank-2 vs rank-1 asymmetry consistent with AN-026/AN-027 coordination — flagged as a follow-up. Confidence: green.- completed: 2026-06-02
AN-041 (
source/analysis/an-041-mode-by-sponsor.py) — Mode × sponsor quick win on 244 curated pairs. Marginals: sponsored 95.1 % in_person, 0 % phone, 4.9 % not_specified; independent 88.5 % in_person, 9.8 % phone, 0.8 % mixed, 0.8 % not_specified. Joint contingency χ² = 18.66 (dof=3), p = 0.0003 — but in the opposite direction of the cheap-mode-slant prior (sponsors over-use the gold-standard mode and avoid phone). Among the 35 differing-mode pairs the bias contrast mean is +2.23 pp (sign p=0.18, Wilcoxon p=0.16) — underpowered, no mode-specific carrier. Mode-substitution refuted as a Channel-A lever. Promoted to the ruled-out table in source-of-bias.md; the 5 % sponsored-sidenot_specifiedrate (vs 1 % independent) is an opacity signal, not a design substitution. Confidence: green.- completed: 2026-06-02
AN-042 (
source/analysis/an-042-interviewer-training-by-sponsor.py) — Interviewer training & supervisor role × sponsor on 244 curated pairs. Sponsored polls describe interviewer training MORE (84.4 % vs 72.5 %, McNemar χ²=9.22, p=0.002) and supervisor role slightly more (92.6 % vs 87.3 %, McNemar p=0.08) — the opposite of the blanket-opacity prediction. Bias contrast on the 85 differing-training pairs is +5.1 pp (sp-only) vs +6.5 pp (ind-only), MW p=0.58 — interviewer-side documentation does not carry the slant. Refutes interviewer-side opacity as a Channel-A lever AND reframes the opacity story as selective disclosure: sponsored polls under-document sample-shape dimensions (coverage, audit) but over-document visible-rigor dimensions (interviewers, supervisors). Source-of-bias.md opacity-differences table updated with direction-of-gap column. Confidence: green.- completed: 2026-06-02
AN-045 (
source/analysis/an-045-sponsor-bias-by-rank-margin.py) — sponsor bias decomposed by final_rank × tight_race (margin ≤ 0.08). 6-cell descriptive: biggest tight-race amplification at RANK 1, not rank 2 — sponsor effect at rank-1 jumps from +4.80pp (non-tight) to +12.20pp (tight); rank-2 sponsor effect runs +9.54 / +6.81 across the same cut (lower in tight). Pooled OLS confirms:sp × tightis +9.26pp (p<0.001);sp × rank2 × tightis −11.99pp (p=0.001). Within-candidate FE attenuates to n.s. (sponsored-cell n=51-208 per cell limits 3-way power). The AN-040 rank-2 over-statement spans tight and non-tight roughly equally; the close-race-specific bias is on rank-1 winners. Reframes the demand-side story: rank-2 over-COMMISSIONING (AN-027) and rank-1 over-STATEMENT (AN-045) live in different rank × margin cells — coordination vs bandwagon respectively. Confidence: yellow (cell-mean descriptive; FE underpowered for 3-way).- completed: 2026-06-02
AN-043 (
source/analysis/an-043-nonresponse-handling-by-sponsor.py) — Nonresponse/undecided handling × sponsor on 244 curated pairs. Closes the source-of-bias quick-win batch. Structurednonresponse_handlingis 100 %not_specifiedon every pair-side (488/488). Diagnostic regex grep across the surrounding free-text methodology fields confirms the substantive null: undecided/refusal vocabulary (indeciso,recusa,nao respondeu,branco,nulo) appears in only 5 sponsored + 2 independent pair-sides out of 488 (~1.4%); Family B (proporcional,redistribu,descart,pondera) hits 66 % vs 61 % but is contaminated byproporcional(PPS sampling jargon). The single non-trivial signal — joint A∧B mention — is 5/0 sponsored-vs-independent (McNemar p=0.06), in the same direction as AN-042 but n=5. Probe is null-by-data-design: TSE registration PDFs are pre-fielding planning documents; nonresponse handling is a post-fielding analytical choice and naturally absent. To test the nonresponse-handling × sponsor lever requires extending the LLM pipeline to ingest the relatório (post-fielding) PDFs — escalated as a new todo. Closes the three source-of-bias quick-win probes (AN-041 mode, AN-042 interviewer training, AN-043 nonresponse): none surfaced a Channel-A lever that quantitatively closes the +7 pp size-mismatch problem. Confidence: yellow (null at the data-source level, not the substantive level).- completed: 2026-06-02
paper/paper.tex §5 "Extension in development" — size-mismatch + selective-disclosure paragraphs. Source-of-bias probe item 6 (writeup task). Inserted two new paragraphs into the mechanism section: "Size mismatch and ruled-out levers" (documents the +4.6 pp population-frame interaction + +2 pp coverage-class shift collectively under-fitting the +7 pp pooled β; lists AN-024 + AN-033 deferral, AN-041 mode, AN-042 interviewer-training, AN-043 nonresponse-handling rule-outs with magnitudes and tests) and "Selective disclosure" (documents the AN-042 finding that sponsored polls strategically under-document sample-shape dimensions while over-documenting visible-rigor signals). The paper's mechanism section no longer overclaims that Channel A is the identified mechanism; instead it documents what has been ruled out and where the residual lives. Paper compiles clean at 21 pages. Build verified via pdflatex from paper/ dir.
- completed: 2026-06-02
AN-050 (
source/analysis/an-050-bias-by-rank-at-commission-margin.py) — rank-at-commission sharpening of AN-045. Sample of 16,810 rows (candidates with at least one prior neutral poll in muni). Three takeaways: (a) AN-045 rac1-tight bandwagon pattern SURVIVES but ATTENUATES (+12.20pp → +6.59pp; n=20 sponsored), so the bandwagon signal is real but ~half what AN-045 reported — the difference is Simpson's-style aggregation across heterogeneous within-candidate trajectories; (b) AN-040 rank-2 over-statement VANISHES under ex-ante rank (rac2 sponsor effect: −0.46 non-tight, +4.20 tight) — the rank-2 over-COMMISSIONING (AN-027) is real and ex-ante, but rank-2 over-STATEMENT (AN-040) was ex-post-rank aggregation; (c) thin-cell surprise: rac3+ in tight shows sponsor effect +18.64pp (n=14 sp) — a viability-grab pattern (trailing candidates in close races commission polls to push into the top-2 coordination zone) that needs thin-cell triage before standing up. Within-candidate FE specs all underpowered. Yellow confidence descriptive.- completed: 2026-06-02
PollQuestionario LLM extractor — new methodology pipeline component. Added schema (
PollQuestionario,QuestionnaireScenario, enumsQuestionType/CandidateOrder) topipelines/politica/source/llm/schemas.py; new extractorpoll_questionario.pyparallel topoll_bairro_detail.py; system + user prompts inprompts/. Pilot runner atprojects/poll-sponsor-bias/source/llm/pilot_questionario.pywith--smoke/--full/--reextractflags. Pilot completed on 338 curated-pair protocols (335 valid, 3 image-only skipped; 19 min wall, ~$1-2 cost at gpt-4o-mini). Cache local to project (build/llm/questionario_pilot/cache/) so prompt iteration doesn't pollute the sharedpipelines/politica/build/llm/poll_questionario/cache.- completed: 2026-06-02
AN-051 (
source/analysis/an-051-questionnaire-rotation-by-sponsor.py) — Questionnaire rotation / scenarios / framing × sponsor on 241 curated pairs. First sharp within-pair Channel-A signal from the source-of-bias quick-win battery. Headline:name_rotation_documentedsp 5.4 % vs ind 20.3 % (McNemar p ≈ 10⁻⁵); combined randomization indicator (rotation ORrandomheadline order) sp 5.4 % vs ind 26.1 % (b=12, c=62, McNemar p ≈ 4 × 10⁻⁸). Sponsored polls dramatically under-document candidate-name rotation. Cherry-pick hypothesis refuted: sponsored polls file FEWER vote-intention scenarios than independents (3.73 vs 4.49, Wilcoxon p = 0.002 in the opposite direction).approval_question_presentsp 26.1 % vs ind 32.8 % (p=0.09, marginal); rejection / nonresponse / asks-party-first all null. Bias-carrier MW on the 74 differing-rotation pairs gives sp-only contrast +7.34 pp vs ind-only +3.41 pp (p=0.20) — direction-consistent with "no-rotation → more bias" but underpowered. Same selective-disclosure pattern as AN-042 (sharp documentation contrast, null within-pair carrier). Promoted name rotation to the "Concrete design-choice differences" table in source-of-bias.md alongside population frame / coverage class / census-setor usage. Confidence: green (headline marginal is robust); follow-ups queued for pollster-FE check, candidate-position direct test, and universe-scale rerun (~$50-100). [2026-06-02 update]: AN-052 and AN-053 follow-ups walked back the Channel-A reading — see entries below; rotation is a firm-tier composition signal, not a within-firm sponsor lever.- completed: 2026-06-02
AN-052 (
source/analysis/an-052-rotation-within-firm.py) — Firm fixed-effects refit of the AN-051 rotation contrast. Walks AN-051 back: the marginal −20.8 pp sp-vs-ind rotation gap VANISHES within firm. LPM with firm FE (45 firms, 384 pair-sides) gives sponsored coef +0.025 pp (cluster-SE 0.027, p=0.37) vs the marginal logit's −1.84 log-odds (p < 10⁻⁷). Per-firm descriptive: 19 of 20 firms with both sponsored and independent pair-sides have identical rotation rates between sides (mostly 0% on both, occasionally 100% on both). 0 firms show sp_rot < ind_rot (the AN-051 sign). The AN-051 contrast is a between-firm composition effect, not a within-firm sponsor-choice effect — sponsored polls concentrate on firms (INSTITUTO PARANA, INSTITUTO VERITA, PROMIDIA, etc.) that as firms don't document rotation. Lever maps onto the AN-018 firm-size discipline gradient. Logit-FE blew up to +23.9 (NaN SE) from perfect separation; LPM is the trustworthy estimate. Confidence: green.- completed: 2026-06-02
AN-053 (
source/analysis/an-053-candidate-position-by-sponsor.py) — Direct candidate-position priming test on 196 curated pairs. Tests the sharpest version of the name-order priming hypothesis: does the sponsor's OWN candidate appear earlier in the headline estimulada roster on the sponsored side than on the matched independent side? Refutes priming-exploitation. Sponsored polls list the candidate LATER not earlier: mean position 2.93 vs 2.74 (Wilcoxon raw p=0.0009; normalized p=0.00002 in the WRONG direction). P(candidate listed FIRST) sp 20.4% vs ind 30.1%, McNemar discordant b=6 (sp_first only), c=25 (ind_first only), McNemar p=0.001 in the WRONG direction. Bias-carrier MW IS significant (p=0.019) on the small 17-pair sp-earlier subset (+9.69 pp vs +3.44 pp for sp-later) — when priming IS used, bias is higher, but sponsors don't systematically use it. Required upgrading the fuzzy name-matcher with distinctive-token logic for Brazilian nicknames ("JOAO JORGE FADEL FILHO" ↔ "João Fadel"), bumping match rate from 25% to 80% (61 → 196 pairs). Combined with AN-052: the AN-051 rotation finding fully folds into the firm-tier discipline story; sponsors don't manipulate questionnaire design, they choose firms whose templates happen not to randomize. Confidence: green.- completed: 2026-06-02
AN-054 (
source/analysis/an-054-rotation-by-firm-size-tertile.py) — Rotation rate × firm-size tertile (loop-closer on AN-051/052/053). Tests AN-052's interpretation directly: if sponsors get the no-rotation outcome by choosing low-discipline firms, rotation rate should track firm size monotonically. Result confirms: on the broad 124-firm panel (universe-volume tertiles from 2024 TSE registry), rotation rate is 3.7 % small / 15.2 % medium / 19.4 % large — a 5× monotonic gradient. AN-018's narrower 15-firm subset shows the same direction in muted form (0 / 0 / 4.5 %, sponsored-heavy by selection). The AN-051 contrast is fully absorbed into the AN-018 reputation-by-volume / AN-025 media-filter narrative — low-volume firms don't document rotation, sponsors choose those firms, the rotation gap follows. No new Channel-A lever; the mechanism lives at firm choice, not registration/questionnaire design. Confidence: green.- completed: 2026-06-02
GPT-5-pro pre-submission review (2026-06-14)
GPT-5-pro pre-submission review for a top poli-sci submission. Review file:
docs/reviews/gpt5-2026-06-14-paper-note-review.md. Items below are completed;
G1 (LLM gold standard), G2 partial (per-poll publication data unavailable),
G4 (universe-scale Channel A; blocked on running batch), and G10 partial
(espontânea-only placebo deferred) remain in todo.md.
G3. Multiway clustering + WCR p-values in main table. Added
fit_panel_multiway,_residualize, and a genericwcr_p_valuehelper tosource/analysis/regressions.py(FWL-based wild-cluster restricted bootstrap, B=2000). New CSV rowsspec1_mw_muni_pollster,spec2_mw_muni_pollster,spec2_cluster_candidate,spec2_wcr_p,spec3c_wcr_p. Muni × pollster clustering moves spec-2 SE 1.04 → 1.07; candidate-cluster 1.04 → 1.14 — both still p < 0.001. WCR p-values for spec 2 AND spec 3c (the small-G spec) are 0 of 2,000 (< 0.0005). New macros DescSpecTwoSe, SeMw, SeCand, WcrP, DescSpecThreeCWcrP. §sec:results main-estimates carries a footnote reporting all four sensitivities.- completed: 2026-06-14
G5. Within-firm meta-regression with inverse-variance weights. Extended AN-018 with two new specs:
univariate_iv_wls_fe(pure inverse-variance, fixed-effects meta) andunivariate_iv_wls_re(DerSimonian-Laird random-effects meta with between-firm τ²). All four weightings of the firm-size slope: unweighted OLS δ=−4.28 (p=0.017); n_self-WLS δ=−5.74 (p=0.0005); IV-WLS (FE) δ=+7.86 (p<0.001, misleading because homogeneity is badly violated); RE meta (DL) δ=−3.67 (p=0.037), τ²=83.25 — the measurement-error-aware version. §sec:within-firm now reports the RE slope and τ² explicitly. New macros DescFirmSizeRe{Slope,Se,P,TauSq}.- completed: 2026-06-14
G6. Same-poll sponsor-vs-opponent DiD. Added
source/analysis/an-069-same-poll-did.py+docs/analyses/an-069-same-poll-did.md. Uses the existingpairs_with_extractions.parquetthat already carriescontrast(sponsor's candidate jump) andopp_contrast(top opponent's jump). DiD = sponsor − opp = +8.17 pp, t = 6.43, n = 239 pairs, 64% positive. Honest scope: DiD does NOT cleanly separate per-candidate slant from sample-design slanting; complements digit forensics. §sec:digit now reports the DiD with the honest caveat. New macros DescSamePoll{DiD,DidT,DidN,DidPos}.- completed: 2026-06-14
G7. Comparator definition (text-vs-AN-002 conflict). The code (
source/assemble/poll.py:65,110—INDEPENDENT_TYPES = {"media", "pollster_self"}) and the paper prose agreed; the AN-002 headline + YAML were inconsistent ("independent-media-only"). Fixed AN-002 headline / YAML and added a comparator-definition blockquote. Newspec3c_race_week_media_onlysensitivity inregressions.pydrops pollster-self: β = +8.15 pp, p = 0.036, n = 165 rows / 28 cells — slightly larger than the standard 3c (+8.05), so pollster-self isn't pulling the headline up. New AN-063 page documents the sensitivity. New macros DescBetaSpecThreeCMedia, DescPSpecThreeCMedia, DescNRowsSpecThreeCMedia. §sec:results main-estimates ¶3 explicitly defines the comparator pool and reports the sensitivity.- completed: 2026-06-14
G8. β heterogeneity by rank-at-commission × competitiveness. Surfaced AN-050 into a new §sec:rank-heterogeneity subsection at the end of §sec:results, bridging to §sec:selection. Headline cell values (within-cell sponsored-vs-not error gap): leader (rac=1) in tight races +6.6 pp, leader in non-tight +2.7 pp, runners-up (rac≥3) in tight races +18.6 pp ← largest cell. β is amplified exactly where the strategic value of slant is largest — causal-side counterpart of the competitiveness-driven self-sponsoring pattern in §sec:selection-commission. New macros DescRankHet{Leader,Runner}{Tight,Wide}Gap. Money-moderator regression (AN-029/AN-030) on the same rank × competitiveness cells would close the loop; deferred to a follow-up.
- completed: 2026-06-14
G9. BH-FDR correction on design-inventory levers. Added
source/analysis/an-066-fdr-bh.py+docs/analyses/an-066-fdr-bh.md. BH q-values for the 10 displayed within-pair tests: 6 of 10 survive at q < 0.05 and 7 of 10 at q < 0.10. The two same-direction-as-Channel-A rows (population-frame mismatch, methodology completeness) both fail correction (q = 0.15 and 0.24); the strong rejections (rotation, fabrication, phone-mode, partisan-stronghold, training, deferral) survive comfortably. Sharpens the "no single lever carries +7 pp" claim. New macros DescFdrN (=10), DescFdrSurviveFive (=6), DescFdrSurviveTen (=7). Caption of Table 3 extended with a sentence reporting the BH correction outcome, citing the population-frame and methodology completeness rows by name.- completed: 2026-06-14
G11. Sponsor-label permutation for Spec 2. Added
source/analysis/an-068-spec2-permutation.py+docs/analyses/an-068-spec2-permutation.md. FWL-residualizes y and sponsored_by against (candidate FE + pollster FE + methodology controls), then row-level random permutation of sponsored_by_tilde × 500 draws. Observed β = +6.86 pp; null mean = +0.005, sd = 0.62 pp, max |β| over 500 = 2.07. 0 of 500 permutations reach observed magnitude (~11 null SDs from null mean). Two-sided permutation p < 1/500 = 0.002. §sec:robust FE-stability paragraph now reports the Spec 2 permutation alongside the Spec 3c race-week permutation. New macros DescPermSpecTwo{NullSd,NullMaxAbs,NPerm,P}.- completed: 2026-06-14
G12. Vice-prefeito footnote relocation. The footnote was three lines of detail on the committee-CNPJ tag check. Inlined the substantive claim ("vice-mayor mis-attribution is ruled out by the office tag on the committee CNPJ name") and dropped the footnote; no appendix needed.
- completed: 2026-06-14
G13. Court-challenges pilot label as exploratory. §sec:roadmap paragraph heading now reads "Legal challenges target form, not methodology (exploratory)" with "an exploratory pilot extraction" in the opener.
- completed: 2026-06-14
G14. Mechanism table grouping by documentation-only vs sample-shaping; mark pre- vs post-field. Added a sentence in the §sec:roadmap prose introducing Table 3 that groups the 10 levers into sample-shaping (4: partisan stronghold, phone-mode, frame mismatch, fabrication) vs documentation (6) and pre-field (7) vs post-field (3: audit floor, fabrication signature, ponderação specificity). Did NOT restructure the hand-authored table cells to add explicit category columns — too disruptive for an hour-scale cleanup; the prose categorization gives readers the same map.
- completed: 2026-06-14
G15. Route shares + leave-one-route-out diagnostic. Extended the §sec:caveats Route-mix paragraph with explicit shares (Route B ≈ 72%, D ≈ 20%, A ≈ 2%, C ≈ 6%) via 4 new macros (DescNSelfRouteAPct/B/C/D), and pointed to §sec:route-heterogeneity for per-route β + replication for the full leave-one-route-out for Spec 3c.
- completed: 2026-06-14
G16. Binned scatter of β_i vs log(volume) with inverse-variance weights. Extended AN-018's existing scatter (
build/figure/firm_size_discipline.pdf) to add the DerSimonian-Laird random-effects meta-regression line + an approximate pointwise 95% CI ribbon, alongside the existing OLS and n_self-WLS lines. Not cited in the paper directly (the paper carries the AN-016 forest plot as the within-firm figure); the scatter is the supplementary view for the §sec:within-firm meta-regression sentence.- completed: 2026-06-14
G17. Literature additions. Contribution paragraph in §sec:intro now cites Kamenica-Gentzkow 2011 (Bayesian persuasion / public-signal design) and Gentzkow-Shapiro 2006 (media bias and reputation), and frames the contribution explicitly as supply-side complement to the demand-side / voter-response framing of the survey sponsor-effects literature.
- completed: 2026-06-14
Post-review pass + figure + hypothes.is annotations (2026-06-14, second session)
Session continued the pre-submission revision pass on top of the GPT-5-pro work. Two distinct sub-passes:
(a) Event-study figure + intro/abstract tightening on Henrik's
follow-up direction;
(b) Walk-through of 15 hypothes.is annotations Henrik left on the
local build (build/site/paper/index.html), all on the abstract /
introduction / policy paragraphs.
Full annotation-by-annotation log:
docs/responses_hsigstad_2026-06-14.md.
Event-study figure (AN-070)
- AN-070 event-study trajectory figure. New analysis script
source/analysis/an-070-event-study-trajectory.py+ AN pagedocs/analyses/an-070-event-study-trajectory.md. For each self-sponsored poll, stacks all independent polls of the same candidate in the same race in a $\pm 4$-week window, bins by weeks-to-event, plots mean polling error per bin with 95% CIs (event-clustered). Restricted to events with at least one independent neighbor in window (n = 117 of 450 self-sponsored polls). Independent bins hover ±2 pp of zero in every pre- and post-event week; the self-sponsored point at t=0 sits at +7.4 pp. No residualization (race × week FE would mechanically erase the discontinuity). Wired into paper.tex §sec:placebo as Figure 1 with caption pointing at the picture-version of the within-candidate trajectory test. New macros DescEventStudyJump, DescEventStudyEvents.- completed: 2026-06-14
§sec:placebo / Within-candidate trajectory test restructure
§sec:placebo restructured around the figure. Lead is the identification logic (what the test rules out) → Figure 1 → one-paragraph read of the figure that lands the "one-day spike would be required for momentum" kill → three-numbers convergence closer. The numerical pre/post jump narrative (full +6.78 / post +6.46 / 7-day +9.07 / drift-bound +2.30) moved to new appendix Section A: "Within-candidate trajectory test: numerical version".
- completed: 2026-06-14
Post-poll trajectory mirror added. Extended
note_table.py::compute_placebowith a post-poll mirror (next independent poll AFTER the sponsored one, mirroring the prior pre-poll comparison). Macros DescPostJump / T / N / GapMedian / PosPct / SevenJump / SevenT / SevenN added in build_numbers.py. Argument now closes the "implausible momentum" alternative: the sponsored poll sits above BOTH neighbours (pre + post), so the bias is not real momentum that decays.- completed: 2026-06-14
Intro tightening (multiple passes)
Abstract reframed. "the estimate holds when comparing only polls of the same candidate in the same race fielded the same week" → "compared both to actual electoral outcomes and to independent polls of the same candidate fielded in the weeks before and after". Closing restructured to legal-then-reputation-then-disclosure flow. "7 to 8 pp" → "7 pp" via new auto-rounded macro
\DescBetaHeadlineRound(= round(β_spec2), unsigned). "Sender-specific" tail dropped.- completed: 2026-06-14
Intro paragraph cluster trimmed. (a) "Pre-election polls are a primary input to voter beliefs, donor decisions, and campaign strategy" → drop "campaign strategy" + add literature footnote citing Granzier 2019, Araújo 2021, McAllister 1991, Farjam 2020, Dahlgaard 2015 (voter-side mechanisms) and Mutz 1995 (donors). (b) "but a hard one to answer with observational data" → "but the existing literature does not answer it directly" + footnote on the Leeper-Thorson / Crabtree / Panagopoulos online-survey / exit-poll gap. (c) Pre-registration paragraph compressed from 4 sentences to 2. (d) "From this universe" dropped, starts "I assemble…". (e) Within-candidate fixed-effects estimate restatement removed (redundant after plain-language headline). (f) "Brazil's 2024 municipal elections" → "Brazil". (g) Measurement-artifact paragraph about within-firm extraction-template constancy moved out of intro (still in §sec:results).
- completed: 2026-06-14
Intro mechanism paragraph rewritten. Dropped "rules out fabrication" + the standardised-error variance-test sentence. Replaced "Channel A predicts" framing with "Another explanation could be survey design. Tests on each registered design choice with a predicted sign return null or reversed…". Dropped the firm-selection rule-out (moved to body). Simplified "by elimination" ending to "This suggests that the source of bias lies in subtle channels --- sample frame composition, interviewer conduct, and strategic field-period choices --- that are not detectable from the formal survey registration."
- completed: 2026-06-14
Intro policy mini-summary restructured. Replaced the "Two policy levers compete" framing with a setup question + "First, legal accountability." paragraph + "Second, reputation." paragraph
- tractability closer. All "lever / levers" purged from intro.
- completed: 2026-06-14
Section + figure cross-refs in intro dropped. Per Henrik's rule that the intro should state findings in words, removed the Figure 1 reference, the Table 3 reference in the mechanism paragraph, and the Table 3 mention in the roadmap paragraph. Section refs in the section-tour at the end are kept (they're navigation, not findings).
- completed: 2026-06-14
Body-wide: Channel A/B + lever terminology purge
- Channel A → "survey design" / "design-side"; Channel B →
"fabrication"; lever(s) → "design choice(s)". 28 occurrences of
"Channel A" / "Channel-A" / "Channel B" / "Channel-B" purged across
§sec:results, §sec:roadmap, §sec:caveats, the appendix balance
section, and the title-page footnote. Six additional "lever"
occurrences purged from the same sections. Paragraph headers
renamed: "Channel A vs Channel B decomposition" → "Survey design
versus fabrication decomposition"; "Inventory of directional
Channel-A predictions" → "Inventory of directional survey-design
predictions"; "Channel decomposition not yet possible" →
"Design-vs-fabrication decomposition not yet possible". Table 3
column "Lever" → "Design choice". Final audit: zero remaining
"Channel A/B" or "lever" occurrences in paper.tex.
- completed: 2026-06-14
§sec:policy restructure
- §sec:policy mirrors the intro's two-approach structure.
Setup +
\paragraph{Legal accountability.}(Brazilian legal infrastructure exists but bias is not actually challenged on substance; refers back to §sec:roadmap 50-case pilot; what the findings do NOT support — a punitive regime) +\paragraph{Reputation.}(theoretical incentive + reputation-by-volume evidence + within-firm visibility gradient) +\paragraph{Mandatory disclosure of bias.}(sponsor-aware consumer disclosure + per-firm scorecards)\paragraph{A narrower regulatory path.}(contingent on design-vs-fabrication decomposition). All "lever" terminology purged.
- completed: 2026-06-14
Other
bairros → neighborhoods in intro (per Henrik). Other
bairrooccurrences are TSE document names ("Detalhamento de bairro/município") and stay.- completed: 2026-06-14
mutz1995horseracebibtex entry added topaper/references.bib(Journal of Politics 57(4):1015-1042) for the donor-allocation cite in the intro footnote.- completed: 2026-06-14
2026-06-19 — AN-109 v2 — calibrated blind audit at AN-110 empirical DEFF*
- AN-109 v2 (update of
source/analysis/an-109-per-poll-z-blind-audit.py) — sensitivity panel extended to DEFF ∈ {1, 1.5, 2, 2.5, 3, 12.59} with 12.59 = AN-110 pooled empirical DEFF*. At the empirical DEFF the blind audit is calibrated: independent-protocol FP rate falls to 4.8 % at Bonferroni, 5.1 % at Holm, 10.9 % single-hyp (right at nominal 5 % after multiple-test correction). Sponsored protocols catch at 12.3 % vs 4.8 % independent baseline → +7.5 pp excess at Bonferroni, +11.9 pp single-hyp — a 2.6× lift. Row-level: only 4.3 % of sponsor's-candidate rows are tail-extreme individually (per-poll bias buried for 95.7 %), but the protocol- level "at least one share extreme" rollup catches 12.3 % — the bias has tail concentration that a calibrated blind audit catches. v1's +13 to +19 pp excess at DEFF ≤ 3 was inflated by miscalibration; the v2 row is the honest calibrated number. AN page updated in place (no new AN id); confidence upgraded yellow → green. Updates: newblind-audit-detectabilityfinding (policy- reputation framing replacing earlier mechanism framing — see also today's correction pass on the AN-108/109/110/111 family), artifacts.yaml descriptions, figure redrawn with log-x DEFF axis
- nominal-FP and empirical-DEFF reference lines.
- completed: 2026-06-19
2026-06-19 — Correction pass: detectability findings are policy/reputation, not channel A vs B
- Reframing of AN-108/109/110/111 across all citing docs.
The detectability/auditability results from AN-108 (binomial SE),
AN-109 (blind multi-candidate audit), and AN-110 (empirical DEFF*)
do not bear on whether the +7 pp bias is generated by lawful design
tilt vs fabrication — a design tilt and a fabrication of the same
magnitude have identical detectability properties. Earlier framing
of these results as "Channel-B-leaning" was incorrect and has been
removed. The honest reading is that they speak to reputation
(visibility to a sponsor's own pollster running internal QA) and
policy (deployability of a calibrated regulator-facing blind audit
yielding 2.6× lift). Changes: reverted the three sub-paragraphs
added today to
channel-decomposition-open.md; stripped A/B framing from AN-108/109/110/111 interpretations; created newblind-audit-detectability.mdfinding bundling the detectability- policy + reputation story; updated
headline-sponsor-bias.mdcross-ref; updated artifacts.yamlcited_infrom channel-decomposition-open to blind-audit-detectability.
- completed: 2026-06-19
- policy + reputation story; updated
2026-06-19 — AN-111 — headline robust to empirical-noise-aware SEs
- AN-111 (
source/analysis/an-111-headline-robustness-empirical-noise.py) — Spec 2 and Spec 3c re-fit under six SE choices: cluster_muni (baseline), cluster_race_week, cluster_politico_id, twoway (muni_id, field_period_week), twoway (muni_id, politico_id), wild-cluster restricted bootstrap at race_week (B=2000). Spec 2 β = +6.86 pp invariant across analytic SE choices; SE ranges 0.81 (race_week) to 1.14 (politico_id); t ranges 6.03 to 8.45; wcr-bootstrap t = 10.6, p ≈ 0. Spec 3c β = +7.91 pp on the 286-row / 46-cell strict subset; SE 2.30–2.64, t 3.00–3.44; wcr-bootstrap t = 6.10. Headline survives every SE choice cleanly — the aggregate β is identified off averaging over n=450 sponsored cand-poll rows and is robust to the empirical noise floor. AN-110's wider empirical noise floor is informative about per-poll detectability comparators (per the AN-108/109/110 family + newblind-audit-detectabilityfinding), not about the aggregate identification. Nodecisions.mdwalk-back proposal. Updates: headline-sponsor-bias finding (SE robustness paragraph + AN-111 cite). Confidence: green.- completed: 2026-06-19
2026-06-19 — AN-110 — empirical DEFF* from race × week × candidate cells
- AN-110 (
source/analysis/an-110-empirical-deff.py) — empirical DEFF* across 673 cells (politico_id × race_week) with ≥3 independent polls, 2,475 polls total. Pooled DEFF* = 12.59 (observed cross-poll var 24.6 pp² / mean binomial var 1.96 pp²); median cell DEFF* = 3.93, P25 1.81, P75 9.59, P90 19.5. Realized cross-firm SD on a candidate's share ≈ 5.0 pp under the pooled estimator — √12.6 ≈ 3.55× the binomial-SE inflation factor and ~4× the 1.5–3 rule-of-thumb DEFF. Every sensitivity slice exceeds the rule of thumb; lowest is high-n (8.29), highest is early weeks (23.3 — real share drift dominates pre-W38). Three consequences. (1) Explains AN-109's miscalibration (binomial blind audit has 30–60 % FP because the realized noise floor is 3.55× the binomial assumption). (2) Qualifies AN-108's per-poll-loud reading: +7 pp is z ≈ 0.79 against the empirical cross-firm SD, not z ≈ 2.8 against binomial SE — comparator-dependent (holds against unbiased benchmark, fails against realized cross-firm distribution). (3) Aggregate regression β = +7 pp survives at N = 568 because noise averages out. DEFF* decomposition unidentified: 12.6 absorbs true sampling DEFF + within-week drift + firm-mode-methodology heterogeneity + firm-systematic bias (AN-016). Upper bound on pure DEFF; right number for an outside-observer noise floor. Cited fromblind-audit-detectability.md. Confidence: green.- completed: 2026-06-19
2026-06-19 — AN-109 — blind multi-candidate audit-ability (excess detection, miscalibrated v1; v2 added later)
- AN-109 (
source/analysis/an-109-per-poll-z-blind-audit.py) — per-poll z vs LOO independent benchmark, single-hyp / Bonferroni / Holm thresholds, DEFF ∈ {1, 1.5, 2, 2.5, 3} sensitivity (v1; v2 with empirical DEFF* = 12.59 added later same day). Carries through 114 of 450 sponsored protocols (25 %, those with same-candidate × race-week independent comparators) and 1,389 independent protocols. Two v1 headlines. (1) Simple-binomial blind audit is not a calibrated test — FP rate on independent protocols is 30–60 % across DEFF assumptions (vs nominal 5 %), so cross-poll variance in race × week × cand cells is much larger than binomial + DEFF ≤ 3 predicts. A regulator can't run a per-poll calibrated detector without empirical-DEFF calibration. (2) Despite miscalibration, sponsored polls fail the test +13 to +19 pp more often than independent polls at Bonferroni, stable across DEFF. Promotes the AN-110 lead (empirical DEFF). External-validity caveat: the 25 %-carry-through sample is plausibly higher-profile races; doesn't directly speak to the 75 % of sponsored polls without within-cell comparators. Cited fromblind-audit-detectability.md. Confidence: yellow (v1) → green (v2).- completed: 2026-06-19
2026-06-19 — AN-108 — sampling-error envelope on biased polls
- AN-108 (
source/analysis/an-108-sampling-se-detectability.py) — per-poll binomial SE on the sponsor candidate's share, summarized by row class (sponsored / indep-of-same-cand / other). Headline: median sponsored-poll SE is 2.49 pp at median n=360 and median share 54.8 %; 99.3 % of sponsored polls have SE < 3.5 pp, so a +7 pp shift is z > 2 on a single sponsored poll against binomial SE. Even with DEFF=2 the +7 pp shift remains z = 2.0. Against a tight (binomial / unbiased) benchmark, the +7 pp bias is statistically visible per-poll — bears on reputation (sponsor's-pollster internal sanity check) and auditability (observer with sponsor info + unbiased benchmark). Silent on the channel decomposition (design tilt vs fabrication have identical per-poll detectability at the same magnitude). AN-110 walks back the per-poll-loud reading against the empirical cross-firm noise floor (z ≈ 0.79). Side observation: sponsored polls have median n = 360 vs within-candidate independent median 600 — feeds the strategic-small-n follow-up. Cited fromblind-audit-detectability.md. Confidence: green.- completed: 2026-06-19
2026-06-15 — AN-071 — accuracy vs sponsor bias by firm (null)
- AN-071 (
source/analysis/an-071-accuracy-vs-bias-by-firm.py) — per-firm accuracy MAE vs within-firm sponsor β scatter, 22 firms with ≥5 self-sponsored polls. Pearson r = −0.09 (p = 0.68) unconditional; β vs accuracy on candidate-sponsored polls also flat (r = +0.06, p = 0.79). Strict cut (≥10 sponsored, n=9) gives r = −0.64 (p = 0.06) but driven by two firms with very negative β (CENSUS, EVA FRANCIELI) that under-rate their own sponsors. Volume-control OLS: log(n_total) coefficient is −4.1 pp per log-doubling (p = 0.02), MAE coefficient drops to −0.56 (p = 0.36) — the disciplining variable is volume, not accuracy. Honest null for the accuracy-as-reputation-signal version of the §sec:policy argument. Confidence: yellow. Not cited from the paper; see AN page for full results and follow-up taxonomy.- completed: 2026-06-15
2026-06-16 — Renormalization concern resolved at sample level
- Paper-note disclosure of within-candidate denominator gap.
Superseded by today's
matched_share == 1.0regression-sample restriction: the within-protocol percent renormalization is a no-op on every row in the sample (every non-aggregate poll candidate matched a TSE registration), so poll share and final share share a denominator by construction. The robustness footnote inpaper/paper.texnow points at the sample restriction as the fix rather than disclosing the AN-014 denominator gap. The AN-014 decomposition (98.6% mechanical, 1.4% denominator shift) remains recorded above; the dedicated footnote is no longer needed.- completed: 2026-06-16
2026-06-16 — AN-079 — Cost × sponsorship × bias (M4 menu-pricing direct test)
- AN-079 (
source/analysis/an-079-cost-by-sponsor.py) — three-spec test of whether slant is priced through declaredvalue_brl. Spec 1: candidate-sponsored polls cost +14.8 % more than independent polls within institute × muni FE (n=8,879, p < 1e-6, very precise). Spec 2: firm-level premium does NOT correlate with within-firm β from AN-016 (r = −0.03, n=21). Spec 3: within sponsored polls, cost does NOT predict bias magnitude (sp × log_cost_z interaction β = −0.35 pp, SE 1.26, p = 0.78). Goiás IPOP "first-place" rate of R$ 6k sits BELOW the universe median (R$ 7,500). M4 as menu-pricing reflected in declared cost is refuted at universe scale. The +14.8 % premium is real but behaves like a production-cost differential, not a slant fee. If M4 carries a slant payment, it is off-book. Enforcement-puzzle doc updated with Caveat 4 under the M4 bullet.- completed: 2026-06-16
2026-06-16 — AN-078 — Disclosure rate by route × dyad (M1 vs M5 discriminator)
- AN-078 (
source/analysis/an-078-disclosure-rate-by-route-dyad.py) — direct M5 test on AN-077's committee gap. Committee_repeat disclosure rate 63.3 % vs committee_singleton 71.7 % (gap −8.4 pp, χ² p = 0.12 — z ≈ 1.65, directional but short of significance). CPF gap −20.8 pp (n=14+11, p=0.53); party essentially zero; party-name +7.4 pp (reversed — repeats more disclosed, consistent with relational reading). Cross-route pattern is internally coherent: M5 signal on tracking-contract routes (committee, CPF), reversed on the separate-decision route (party-name). Weight-of-evidence supports M5 for the committee +7.43 pp β gap; no single test passes p<0.05. Enforcement-puzzle doc updated.- completed: 2026-06-16
2026-06-16 — AN-077 — All-routes repeat-vs-singleton β decomposition
- AN-077 (
source/analysis/an-077-all-routes-repeat-singleton.py) — extended AN-075/76's repeat-vs-singleton split to all four sponsor routes. Committee route shows a sharp +7.43 pp repeat-vs-singleton gap (β_repeat = +14.17, β_singleton = +6.75, n_r=57, n_s=302, both p<0.001). CPF gap +2.61 (n=6+6, the AN-075 result); party gap −1.27 (n=11+21); party-name gap −0.60 (n=56+61). But AN-074 already showed committee repeats have median time-gap 11 days, only 12 % > 30 d — most committee "repeats" are single-campaign multi-wave tracking contracts, not separate decisions to re-hire. The reading cleanly compatible with M1-individual (separate decisions) shows no gap; the route showing a gap is the contract- structure route. M5 (publication option / contract structure) graduates from footnote to live mechanism for the committee gap. The pure relational story (M1-individual) remains not separately identified. Enforcement-puzzle doc updated with the M5-upgraded reading.- completed: 2026-06-16
2026-06-16 — AN-076 — AN-075 robustness without institute FE
- AN-076 (
source/analysis/an-076-cpf-beta-no-institute-fe.py) — re-ran AN-075 dropping institute FE, keeping candidate FE. βs barely move: β_cpf_repeat +19.11 → +18.34 pp; β_cpf_singleton +16.78 → +16.11 pp; other-route changes 0.07–0.41 pp. The institute FE was NOT absorbing an M1-individual firm-choice mechanism — what absorbs the raw repeat/singleton mean-error gap (+20.2 vs +3.5) is the candidate FE (candidates in repeat-CPF dyads have higher baseline errors across all polls). M1-individual at the firm-choice layer also not separately identified. The CPF-uniform / strategic-stake (M4-leaning) reading of AN-075 sharpens. Next discriminating test: apply the repeat-vs-singleton split to committee / party / party-name routes (much more sample).- completed: 2026-06-16
2026-06-16 — AN-075 — CPF β decomposed by repeat-vs-singleton dyad
- AN-075 (
source/analysis/an-075-cpf-beta-by-dyad.py) — refit AN-006's spec with the CPF dummy split intosp_cpf_repeatandsp_cpf_singleton(repeat indicator from AN-074 logic). β_repeat = +19.11 pp (n=6, SE 0.21 — CRVE artifact on 5 thin clusters), β_singleton = +16.78 pp (n=6, CI [+2.0, +31.6]). Point estimates similar within FE; raw means before FE are very different (repeat +20.2 pp vs singleton +3.5 pp), absorbed into candidate × institute baselines. The M1-individual-vs-M4 discriminator does not resolve cleanly at this sample size. Most parsimonious reading at point estimate: the CPF route as such carries ~+17–19 pp regardless of dyad structure — strategic individual stake (M4-leaning) is the operative variable. Enforcement-puzzle doc updated to anchor M4 reading; AN-076 (no-institute-FE robustness) queued as the discriminator for the M1-individual firm-choice channel.- completed: 2026-06-16
2026-06-16 — AN-074 — CPF cell repeat-dyad test (M1-individual vs M4)
- AN-074 (
source/analysis/an-074-cpf-repeat-dyad.py) — per-route (firm × candidate) repeat-pair concentration on the 793 candidate-sponsored dedup observations, stratified by sponsor_route. Permutation null shuffles the firm column. Every route above chance; CPF gap is largest in absolute terms: observed 56.0 % repeat-pair share vs null mean 4.9 % (p < 0.0005, n = 25). But the firms doing CPF repeat dyads (AR7 β −2.4; CENSUS β −8.2) are NOT in the AN-006 high-β tail — the M1-individual structure exists at CPF, but does not co-locate with the AN-006 CPF +19 pp slant. Most natural joint reading: durable M1-individual dyads at low-β firms (relational discipline), one-shot singleton CPF transactions at higher-β firms (M4 carries the slant). Load-bearing follow-up: decompose AN-006 CPF +19 by repeat-vs-singleton subset. Enforcement-puzzle doc updated.- completed: 2026-06-16
2026-06-17 — politica cleaner refactor: multi-year, consolidated, registry-cached
- politica/source/clean/ refactor
- New
poll.py(year-loop YEARS=[2020, 2024]) emitsbuild/clean/poll_{year}.parquet— one row per TSE-registered mayoral poll with the full registration metadata (institute, dates, sample size, pollster CNPJ, etc.). Removes the duplicated per-UF CSV parsing that bothpoll_response_2024.pyandpoll_sponsor.pywere doing locally. poll_sponsor.pyconsolidates the priorpoll_sponsor_2024.py+poll_sponsor_2024_join.pysplit into one year-parameterized cleaner: raw long-table + Routes A–D candidate classification in one pass. Output isbuild/clean/poll_sponsor_{year}.parquetwith route columns merged in (no more separate_candidateparquet).poll_2020.py/poll_2024.pyrenamed topoll_response_{year}.pyto reflect their (protocol × scenario × candidate) grain. Outputs arebuild/clean/poll_response_{year}.parquet.- Both downstream cleaners now read
poll_{year}.parquetfor the mayoral universe rather than re-parsing per-UF CSVs. - EJ's
source/assemble/poll_2024.pyrenamed tocand_poll_2024.pyto reflect its (protocol × candidate) grain; output likewise. - Pipeline runs verified end-to-end on educloud (politica): 6 fresh
parquets in
build/clean/for 2020 + 2024. - completed: 2026-06-17
- New
2026-06-16 — AN-073 — firm party / state specialization on candidate-sponsored polls
- AN-073 (
source/analysis/an-073-firm-party-specialization.py) — per-firm party-HHI, ideological-bloc-HHI, and state-HHI on the candidate-sponsored subsample (793 dedup observations, 38 firms with n≥5). Permutation null draws n_i parties from the universe candidate-sponsored party distribution. State-HHI universal (37/38 firms p<0.05), party-HHI uncommon (10/38), bloc-HHI even rarer (6/38). Party-HHI × within-firm β (from AN-016, n=22): full r = −0.456 driven entirely by CENSUS / EVA FRANCIELI (the two suspected-selection firmsthinking.mdalready flagged); excluding them r = −0.155, trimmed tercile means flat at 7.38 / 8.52 / 8.22. Within-cycle M1/M3 relational-reputation prediction (firm × party dyads track β) is refuted at point estimate; M4 single-shot pricing gains by elimination. Enforcement-puzzle thinking doc updated with the AN-073 reading.- completed: 2026-06-16