Data
Court opinion extractions
Structured extractions from 25 Supreme Court opinions (18 EP, 7 DP) used to illustrate and ground the model's applications.
Source PDFs
PDFs downloaded from Library of Congress U.S. Reports and supremecourt.gov. Re-downloadable via cases/download.sh. PDFs are gitignored (large binaries).
Extraction pipeline
cases/extract_prompt.md— system prompt defining the extraction schemacases/extract.py— Python script calling GPT-4o API to extract structured datacases/extractions/*.md— one markdown file per case (25 total)
Reproducing: python cases/extract.py cases/*.pdf (requires OPENAI_API_KEY in env).
Extraction schema (per case)
- Holding ($H_t$): quoted holding + translation to constraint on admissible $(w,c)$ + open questions
- Fact vector $z_t$:
- 2a. Raw salient facts (what the Court treats as legally relevant, with quotes)
- 2b. Dimension mapping to EP/DP dimension dictionaries (D1–D8)
- Unmapped facts flagged for potential dictionary expansion
- Treatment of prior holdings: status (relied on / extended / distinguished / limited / overruled) + model interpretation
- Overruling: constraint removal, justification mapped to stare decisis factors
- Breadth: narrow reading, broad reading, breadth ambiguity
- Concurrences / dissents: alternative constraint structures
- Reasoning revealing implicit weights: quoted passages showing how dimensions are weighted
Notes
- Brown v. Board extraction was hand-written as the gold standard; all others generated by GPT-4o with Brown as few-shot example.
- Three long opinions (Dobbs, Parents Involved, SFFA) were truncated to fit GPT-4o's context window; concurrences/dissents may be incomplete for these.
- The project previously used the Supreme Court Database (SCDB) for an Epstein & Posner (2016) loyalty replication. That work is archived in git history.