Data

Court opinion extractions

Structured extractions from 25 Supreme Court opinions (18 EP, 7 DP) used to illustrate and ground the model's applications.

Source PDFs

PDFs downloaded from Library of Congress U.S. Reports and supremecourt.gov. Re-downloadable via cases/download.sh. PDFs are gitignored (large binaries).

Extraction pipeline

cases/extract_prompt.md — system prompt defining the extraction schema
cases/extract.py — Python script calling GPT-4o API to extract structured data
cases/extractions/*.md — one markdown file per case (25 total)

Reproducing: python cases/extract.py cases/*.pdf (requires OPENAI_API_KEY in env).

Extraction schema (per case)

Holding ($H_t$): quoted holding + translation to constraint on admissible $(w,c)$ + open questions
Fact vector $z_t$:
- 2a. Raw salient facts (what the Court treats as legally relevant, with quotes)
- 2b. Dimension mapping to EP/DP dimension dictionaries (D1–D8)
- Unmapped facts flagged for potential dictionary expansion
Treatment of prior holdings: status (relied on / extended / distinguished / limited / overruled) + model interpretation
Overruling: constraint removal, justification mapped to stare decisis factors
Breadth: narrow reading, broad reading, breadth ambiguity
Concurrences / dissents: alternative constraint structures
Reasoning revealing implicit weights: quoted passages showing how dimensions are weighted

Notes

Brown v. Board extraction was hand-written as the gold standard; all others generated by GPT-4o with Brown as few-shot example.
Three long opinions (Dobbs, Parents Involved, SFFA) were truncated to fit GPT-4o's context window; concurrences/dissents may be incomplete for these.
The project previously used the Supreme Court Database (SCDB) for an Epstein & Posner (2016) loyalty replication. That work is archived in git history.

Data & Extractions

Data

Court opinion extractions

Source PDFs

Extraction pipeline

Extraction schema (per case)

Notes