Article   |   Volume 167, Issue 2

Data-Driven Originalism

167 U. Pa. L. Rev. 261 (2019)

January 2019

The threshold question for all originalist methodologies concerns the original communicative content of the words of the Constitution. For too long this inquiry has been pursued through tools that are ill‐suited to the task. Dictionaries generally just define individual words; they don’t typically define phrases or allow for consideration of broader linguistic context. And while dictionaries can provide a list of possible senses, they can’t tell us which sense is the most ordinary (or common).

Originalists have also turned to other methods, but those methods have also fallen short. But all is not lost. Big data—and the tools of linguists—have the potential to bring greater rigor and transparency to the practice of originalism. This article will explore the application of corpus linguistic methodology to aid originalism’s inquiry into the original communicative content of the Constitution. We propose to improve this inquiry by use of a newly released corpus (or database) of founding‐era texts: the beta version of the Corpus of Founding‐Era American English.

This paper will showcase how typical tools of a corpus—concordance lines, collocation, clusters (or n‐grams), and frequency data—can aid in the search for original communicative content. We will also show how corpus data can help determine whether a word or phrase in question is best thought of as an ordinary one or a legal term of art. To showcase corpus linguistic methodology, this paper will analyze important clauses in the Constitution that have generated litigation and controversy over the years (commerce, public use, and natural born citizen) and another whose original meaning has been presumed to be clear (domestic violence). We propose best practices, and also discuss the limitations of corpus linguistic methodology for originalism.

Data-Driven Originalism -