Data-quality concern definitively dispatched **and** a substantively new finding surfaces: dramatic between-firm β heterogeneity. Within-firm β (15 firms ≥ 10 sponsored rows; 31 firms ≥ 5): mean +6.5 / +7.2, median +4.4 / +6.3, **range [−11, +35], sd 10.3**. 19 of 31 firms (61 %) are individually significant at p < 0.05; 22/31 positive. **Big-name firms (CENSUS, IIP, INSTITUTO PARANÁ, Verita) have β near zero or modestly negative within-firm; smaller firms (METHODUS +24.7, CAMARGO +23.6, INTENÇÃO +35.2, DATA SC +16.1) slant heavily.** Because PDF style and LLM-extraction pattern are held strictly fixed within firm, the cross-firm β dispersion (sd 10.3 pp) is *real cross-firm sponsor-behavior heterogeneity*, not data-quality artifact. The data-quality concern is closed.

Confidence
green
Type
robustness
Design
Sample
estimulado-non-aggregate-match2 (31,186 rows, 641 sponsored, 8,431 candidates). Firms ranked by sponsored-row count; primary cut at ≥ 10 sponsored (15 firms), supplementary at ≥ 5 (33 firms).
Specification
Spec 2 within firm: error ~ sponsored_by + opponent_sponsored + log_sample_size + days_to_election + days² | candidate FE; cluster-robust SE at muni. Race-FE-only fallback for firms where the candidate FE absorbs too much.
Comparator
each firm's own non-sponsored polls — PDF style and house methodology held strictly fixed
Cluster
muni
Weights
none
Script
source/analysis/an-016-within-firm-beta.py
Target
build/table/within_firm_beta.csv
Status
interpreted · 2026-06-02
Created
2026-06-02

Question

AN-015 ran a within-firm β test on the top-5 pollsters by total row count but found those firms sponsor too few polls themselves (NaN coefficients). The natural set for the test is the firms that do sponsor polls — AN-007's customer-mix-sorting cohort of institutes with ≥ 5 self-sponsored polls each. Refitting spec 2 within each of those firms is the strongest available "PDF style held fixed" test of the headline: if extraction differences across firms were driving the +7-8 pp result, restricting to within-firm identification should kill or shrink β.

Design

source/analysis/an-016-within-firm-beta.py refits spec 2 on each firm's polls separately. Two cuts:

  1. Primary: firms with ≥ 10 self-sponsored rows (15 firms; cleanest within-firm power).
  2. Supplementary: firms with ≥ 5 self-sponsored rows (33 firms; the AN-007 set).

For each firm: spec 2 = error ~ sponsored_by + opponent_sponsored + log_sample_size + days_to_election + days² | candidate FE + pollster FE (degenerate — only 1 firm in subset), cluster-robust SE at muni. If candidate FE absorbs everything (firm with few candidates appearing in both sponsored and non-sponsored polls), fall back to race FE only.

Tabulate per-firm β, SE, p, n. Compute distribution range and the share of firms with significant β. Forest plot of the per-firm coefficients.

Results

Within-firm sponsor coefficient β with 95 % CI

Primary cut (≥ 10 sponsored rows, 15 firms)

Firm n_self β SE p
CENSUS INSTITUTO DE PESQUISAS 72 −2.76 5.71 0.63
IIP INSTITUTO DE PESQUISAS 66 −1.64 2.24 0.47
INSTITUTO PARANÁ 34 −10.95 10.43 0.29
W J MENDES PESQUISAS 28 +4.05 3.91 0.30
EVA FRANCIELI DE SOUZA 20 −8.68 3.54 0.016
INSTITUTO VERITA 17 +0.55 0.75 0.46
NEXXUS MAIS 16 +7.69 3.23 0.020
INSTITUTO DATA SC 13 +16.07 4.34 0.001
INSTITUTO METHODUS 12 +24.72 3.52 <0.001
VISÃO PESQUISAS 12 +13.92 4.92 0.008
RADAR INTELIGÊNCIA 10 +11.56 2.37 <0.001
INSTITUTO CAMARGO E MEDINA 10 +23.59 3.83 <0.001
PROMÍDIA PESQUISA 10 +2.29 0.54 <0.001
INSTITUTO LJM 10 +4.45 2.77 0.12
3S CONSULTORIA 10 +12.86 1.79 <0.001

Summary: mean β +6.51, median +4.45, range [−10.95, +24.72], sd 10.29. 11 of 15 positive; 9 of 15 significant at p < 0.05.

Supplementary cut (≥ 5 sponsored rows, 31 firms with usable β)

Mean β +7.16, median +6.26, range [−11, +35], sd 10.34. 22 of 31 positive; 19 of 31 significant at p < 0.05.

Notable additions from the supplementary cut:

Firm n_self β SE p
INTENÇÃO INSTITUTO 5 +35.20 3.80 <0.001
OPINIÃO ESTATÍSTICA 6 +19.57 <0.001
TRIÂNGULO MULTIPROJETOS 7 +14.66 3.76 0.001
AR7 PESQUISAS 9 −4.00 1.12 0.001
OPINAR PESQUISAS 9 +7.85 1.47 <0.001

(Some firms had candidate FE fully absorb the within-variation; for those, the table shows the race-FE-only fallback. The race-FE fallback gives slightly larger absolute β by AN-014's multiplicative-scaling argument, but the qualitative cross-firm pattern is robust.)

Interpretation

Two main findings, one negative (the question we set out to answer) and one positive (a substantively new observation):

The data-quality concern is dispatched

Within each firm, PDF layout, extraction template, scenario-label conventions, and the LLM's prompt-response behavior are all constant by construction. If sponsored-poll-vs-independent-poll extraction differences were driving the headline, every firm's within-firm β should be similar (since each firm's PDFs are self-consistent). Instead β ranges from −11 to +35 across firms — a 46-pp spread that cannot come from a single uniform LLM bias. Combined with AN-013 / AN-014 / AN-015, this closes the data-quality flank.

A new substantive finding: firm-level β heterogeneity

The headline +7.85 pp is a cross-firm average masking dramatic firm-level heterogeneity:

This is a clean instance of the AN-007 customer-mix-sorting prediction: firms whose business is mostly candidate-sponsored work face lower reputational cost from slant; firms with mixed business (media + sponsored) discipline themselves to maintain publication-side credibility. The headline number is a weighted average across an industry that is sharply segmented.

Combined with the rest of the battery

AN What's ruled out
AN-010 comparator contamination, FE-selection, route false positives, renormalization-as-design-lever (partially)
AN-011 FE-structure permutation, single-firm dominance, single-state dominance
AN-012 thin-cluster CRVE under-coverage, sample-size weighting, week-window brittleness
AN-013 Channel B per-row fabrication
AN-014 K2 "sponsors list fewer candidates" mechanism
AN-015 differential LLM extraction quality (script proxies)
AN-016 differential LLM extraction quality (within-firm holding PDF style fixed)

Together, the data-quality concern is now closed on six independent fronts:

  1. Within-poll digit symmetry (AN-013)
  2. Within-candidate denominator stability (AN-014)
  3. Clean independent baseline (+0.93 pp; not zero-shifted)
  4. Negative opponent coefficient (zero-sum sponsor effect, not noise)
  5. Quality-proxy distribution + clean-subset β (AN-015)
  6. Within-firm β heterogeneity (AN-016) — the 46-pp cross-firm spread is incompatible with a uniform extraction bias

Follow-ups

  1. Customer-mix-sorting refresh (extension, high paper-value). AN-007 ran a cross-section regression of per-pollster β on candidate-share of customer mix using 11 institutes. AN-016 expands the per-firm β table to 31 firms with usable estimates. Refresh AN-007's slope with the larger sample; the relationship should be sharper and the SE tighter. Suggested edit: rerun source/analysis/pollster_customer_mix.py reading the AN-016 β table.
  2. Industry-segmentation framing for the paper note (blind spot, high paper-value). The β = sd 10.3 cross-firm dispersion is a substantive finding the headline number obscures. The paper note's framing should explicitly distinguish the average effect (+7.85) from the dramatic between-firm dispersion. The implied policy reading shifts from "polling industry has a bias problem" to "polling industry is segmented; the bias problem lives in a subset of niche firms." Suggested footnote: cite AN-016 §Results and the forest plot.
  3. Hand-validation prioritization (refinement of the queued TODO). The hand-validation TODO should prioritize sampling polls from the high-β firms (METHODUS, CAMARGO, INTENÇÃO, RADAR) — if those firms' sponsored-poll PDFs do look qualitatively different, that would be an interesting finding. Conversely, sampling from CENSUS / IIP / Verita where β ≈ 0 would confirm extraction is working as expected on the big firms.
  4. Heterogeneity by firm size (extension). AN-016 shows large firms have low β. Formal test: regress per-firm β on log(n_polls_total). If a clean monotone relationship exists, it's a reportable "size as discipline" finding.