Research Agenda

For Information on Citations
For Publications in Chronological Order

Research Agenda of Matthew Sag

* This page was last revised in October, 2018.

My primary specialty is in intellectual property and I am particularly well known for my contributions to copyright theory and empirical studies of intellectual property. Building on my six years in practice working with technology clients, the majority of my research grapples with how the law influences the development of new technology and how technology in turn changes the law. Much of my work focuses on how digitization and the Internet have changed the meaning and implications of the act of copying and what this implies for copyright law. My recent work has also dealt with issues relating to text data mining, machine learning, big data and private ordering through algorithmic enforcement.

In general terms, my research can be organized into two broad themes: (1) an institutional approach to understanding how the law responds to the challenge of new technology; and (2) the use of empirical methodologies to study litigation and judicial behavior. These themes are frequently overlapping and/or mutually reinforcing. The following discussion is thematic rather than strictly chronological.

(1) The relationship between law and technology

(a) Institutional accounts of how the law responds to technological change

There is much to be said about the application of old laws to new technology. New technologies can expose latent ambiguities in existing legal regimes, they can introduce new stakeholders whose interests need to be accommodated in regulatory compromises, and they can reveal gaps in existing laws. My research has addressed all of these phenomena, but it also looks more deeply at issues of uncertainty, information costs, and information asymmetries in the formation and articulation of legal rules.

In a series of papers, I utilize an institutional lens to examine how the law deals with technological change across a variety of issues in intellectual property law. In Beyond Abstraction: The Law and Economics of Copyright Scope and Doctrinal Efficiency (2006), I explored the limitations of the dominant economic model of copyright and its surprising failure to take into account transaction costs in the formulation of optimal legal rules. In Patent Reform and Differential Impact (2007), I argued that, given the empirical uncertainty as to whether patent rights are too weak or too strong, Congress should prioritize reform proposals that would have a differential impact on bad patents. I then applied this differential impact test to a number of reforms being proposed at the time.

In An Information-Gathering Approach to Copyright Policy (2012), my coauthor and I examined the role of different government institutions in resolving disputes between the content and technology industries. Using a variety of case studies, from the early 20th Century technology of the player piano to webcasting, our account focused on the varying capacities of government institutions and problems of information and uncertainty. In Promoting Innovation (2015), my coauthor and I challenged the view that the economist Joseph Schumpeter’s famous theory of “creative destruction” suggests a limited laissez faire role for antitrust law (competition law). We accepted the Schumpeterian premise that innovation is the key to economic growth and that creative destruction is a vital source of innovation. However, we argued that creative destruction is an ongoing process and that the law must preserve opportunities and incentives for creative destruction at all stages of innovation and not allow the first creative-destroyer an enduring monopoly.

One of my most recent papers examining how legal institutions adapt to new technology, Internet Safe Harbors and the Transformation of Copyright Law (2017), highlights the recursive and evolutionary nature of the law/technology interaction. The article shows how the substantive content of copyright law has become less relevant in the online environment due to the incentives created by the Internet safe harbors enacted as part of the Digital Millennium Copyright Act (DMCA) in 1998. The article further shows how the Internet safe harbors created under the DMCA are now being eclipsed by a new wave of private ordering implemented by algorithms and automatic filters which supersede both the substantive content of copyright law and procedural safeguards of the DMCA. This has obvious implications for copyright law and policy, but beyond that it raises fundamental issues about the rule of law in an age of big data and black-box algorithms.

(b) The structural role of copyright’s fair use doctrine

Copyright law’s primary instrument of flexibility and adaptability in the face of technological disruption is the fair use doctrine. However, there is a common trope among lawyers, academics, and judges that copyright law’s fair use doctrine in unpredictable and essentially ad hoc. Many would even go further and argue that as an exception to the rights of the copyright owner, the fair use doctrine should be narrowly construed. In my research, I have shown that the fair use has an internal logic that can be derived from copyright law’s most fundamental principles and that once these principles are understood, fair use is rational and reasonably predictable. Moreover, on this principled approach to fair use, the doctrine is no mere exception to rights of copyright owners: the right to make a fair use of a copyrighted work is a right of equal status to those of the copyright owner.

In God in the Machine: A New Structural Analysis of the Fair Use Doctrine in Copyright Law (2005), I argued that, rather than taking away from the rights of copyright owners, the fair use doctrine actually benefits copyright owners by enabling a broader, more abstract statement of their rights than would otherwise be possible. My historical research on the fair use doctrine in The Pre-History of Fair Use (2011) reinforced this structural analysis by showing how the fair use doctrine actually arose out of an expansion of the rights of copyright owners in the early English copyright cases dealing with fair abridgment. My focus on the structural role of the fair use doctrine was also the foundation for one of my most important empirical projects. In Predicting Fair Use (2012), I directly confronted the unpredictability critique by empirically assessing the predictability of fair use outcomes in litigation. Concentrating on characteristics of the contested use that would be apparent to litigants pre-trial, I tested a number of doctrinal assumptions, claims, and intuitions that had not previously been subject to any empirical scrutiny. Broadly speaking, the study showed that the fair use doctrine is more rational and consistent than is commonly assumed.

(c) Copyright and copy-reliant technology – 2009 to 2016

Prior to the digital age, the only plausible reason to reproduce an expressive work would have been to allow some human at some stage to appreciate the expressive qualities of that work. Nowadays, however, there are a host of technologies that rely on intermediate copying as part of a broader analytical process but do not ultimately communicate the underlying expression copied, they only communicate more abstract metadata relating to that expression. Such copy-reliant technologies include Internet search engines, plagiarism detection software, machine learning, and various other forms of text data mining. These uses are “non-expressive” in the sense that they are not intended to enable human enjoyment, appreciation, or comprehension of the copied expression.

In a series of articles and amicus briefs over the past decade, I have proposed and elaborated a particular theory as to how copyright law should be applied to the non-expressive use of expressive works. I first outlined my theory in Copyright and Copy-Reliant Technology (2009) and then refined and updated it in an article focusing specifically on library digitization, Orphan Works as Grist for the Data Mill (2012). The core of my argument in these articles was that the rights of the copyright owner are defined and limited to the communication of original expression to the public. It follows that because non-expressive uses do not communicate original expression to the public (i.e., to any human reading audience for the purpose of being read, understood, or appreciated), such uses do not conflict with the copyright owner’s exclusive rights. In these articles I also explained how this theory should be implemented through copyright’s fair use doctrine.

In 2012, I teamed up with English professor Matthew Jockers and clinical law professor Jason Schultz to reprise these arguments in a two-page commentary in Nature, Digital Archives: Don’t Let Copyright Block Data Mining (2012), and in a series of amicus briefs in the landmark Authors Guild cases (Authors Guild v. HathiTrust and Authors Guild v. Google). The defendants prevailed in both of the Authors Guild cases and we have reason to think that our briefs played some part in that victory. Since 2016, I have served on the Advisory Board of the HathiTrust Research Counsel[1] advising on copyright and related issues in relation to academic digitization and text data mining research.

(d) Copyright and copy-reliant technology – Current and future projects

My work on text data mining and my theory of the relationship between non-expressive use and fair use figures prominently in my research plans in the near future. I am currently working on a comprehensive survey of the legal issues relating to text data mining projects in light of the favorable fair use precedent established in the landmark Authors Guild cases. This draft article, The Legal Landscape for Text Data Mining Research, synthesizes the theory of non-expressive use discussed above and also considers the implications of the many issues left unresolved by the Authors Guild cases. It engages with issues of online contract formation, anti-hacking laws, the Digital Millennium Copyright Act, and cross-border copyright issues, among other concerns. Influenced and informed by my own first-hand experience with text data mining research (see below) I have developed a four-stage model of the lifecycle of text data mining research and I use this model to identify and explain the relevant legal issues beyond the core holdings of the Authors Guild cases in relation to non-expressive use.

I am also working on a with a number of other scholars from the library and text mining communities on a grant application. We are seeking funding to host an advanced institute in digital humanities research to integrate legal literacies with digital humanities workflows. Our aim is help text data mining researchers and their supporting institutions to understand the legal landscape for text data mining research and to develop actionable strategies for navigating these legal issues.

(e) Book Project

I expect to complete my book manuscript, The Modern Law of Fair Use, by the summer of 2019. The book is a comprehensive restatement of fair use that builds on my earlier works. The concept of “transformative use” has dominated fair use jurisprudence since it was adopted by the Supreme Court in the 1994 case of Campbell v. Acuff-Rose. Campbell was a case about parody in which it was obvious that the original work had been substantially changed and that those changes cast a critical eye back on the original. Since Campbell, however, courts have applied transformative use to a wide variety of fair use claims, many of which involve new technologies that are far removed from traditional fair uses, such as parody, commentary, and criticism.

My aim is to provide a sound theoretical foundation for modern fair use law that addresses some of the current tensions evident in the concept of transformative use. I show how, reasoning for copyright law’s most basic principles, the essential consideration in fair use cases should be whether the defendant’s use poses a risk of expressive substitution. This focus on expressive substitution explains both the fair use status of non-expressive uses, such as text data mining, as well as other more traditional fair uses, including parody, commentary, criticism, illustration and explanation.

(2) Empirical Analysis of Litigation and Judicial Behavior

(a) Empirical Analysis of Intellectual Property Litigation

In 2012, I began working on ways to better understand intellectual property litigation by analyzing case filings as opposed to simply looking at litigated cases, which are well understood to be an unrepresentative sample. This research ultimately lead to my article IP Litigation in US District Courts (2016), which undertook a broad-based empirical review of IP litigation as a whole. The article focused on changes in the distribution of IP litigation over time and their regional distribution and presented key findings with respect to the changing nature of file sharing litigation, the true impact of patent trolls on the level of patent litigation, and the extent of forum shopping and forum selling in patent litigation.

In the course of building a database for IP Litigation in US District Courts, I made some startling discoveries about the nature and prevalence of copyright trolling lawsuits. This process of discovery began when I sat down to read 50 randomly selected copyright docket files from the Northern District of Illinois in 2012. Almost every case asserted that some unknown person at a particular IP address had infringed copyright by using BitTorrent (a method of peer-to-peer file sharing) to infringe copyright in a pornographic film. At this stage, some law review articles had commented on aspects of this type of litigation, but no one seemed to understand how prevalent it had become.

In Copyright Trolling, An Empirical Study (2015), I established, for the first time, the astonishing extent to which copyright litigation in the U.S. had become dominated by a handful of copyright owners of pornographic content suing in multi-defendant John Doe lawsuits (i.e., suits filed against an Internet Protocol or IP address). At the time, more than half of all new copyright cases in the U.S. were pornography related John Doe lawsuits. In this article, I argued that the systematic opportunism reflected in these lawsuits, many of which bordered on outright extortion, called for a rethinking of status-based definitions of “trolling” in both the copyright and patent contexts. The article also explored how copyright’s statutory damages regime and the federal district courts’ permissive approach to the joinder of multiple essentially unrelated defendants had enabled the current wave of copyright trolling.

My coauthor and I wrote Defense Against the Dark Arts of Copyright Trolling (2018) as an act of public service. In the typical copyright trolling case, the plaintiff’s claims of infringement rely on a poorly substantiated form pleading and are targeted indiscriminately at non-infringers as well as infringers. Our deep dive into this issue revealed a world of shell companies, false fronts, dubious foreign technology consultants, at least one entirely fictitious expert witness, and a general pattern of unsavory behavior. Copyright trolling is rewarding because the plaintiffs are able to target thousands of defendants with the offer of a quick settlement for a few thousand dollars and the threat of statutory damages in the hundreds of thousands of dollars. In Defense Against the Dark Arts, my coauthor and I took the time to do what few defense lawyers can afford to: we systematically analyzed all of the weaknesses of the typical plaintiff’s case and integrated that analysis into a strategy roadmap for both defense lawyers and pro se defendants.

(b) Empirical analysis of judicial behavior in IP cases

My interest in empirical analysis of judicial behavior began with two articles at the intersection of political science and intellectual property (IP). In Ideology and Exceptionalism in Intellectual Property – An Empirical Study (2009) Tonja Jacobi, Maxim Sytch, and I conducted an empirical study of Supreme Court decisions to determine the effect of judicial ideology on IP case outcomes. The article was the first comprehensive study of judicial ideology in the field of IP; it is also one of very few studies to extend the “attitudinal model” to a field of economic regulation. Our key finding that measures of judicial ideology on a coarse left-right spectrum predicted votes in IP cases at the Supreme Court went almost entirely against the orthodoxy of the IP academy at the time. Tonja Jacobi and I followed up on Ideology and Exceptionalism with another paper published the same year, Taking the Measure of Ideology: Empirically Measuring Supreme Court Cases (2009). This article demonstrated how continuous measures of case outcomes could be derived from fundamental behavioral assumptions. We tested the intuitive plausibility of competing measures as they applied in Supreme Court IP cases and in doing so we were able to assess the plausibility of some fundamental behavioral assumptions about how justices work together and form coalitions on the Supreme Court.

(c) Empirical analysis of Supreme Court oral argument using text data mining

Tonja Jacobi and I have recently embarked on a new series of articles using text data mining methods to analyze Supreme Court oral argument. My previous work on the legal issues relating to text data mining of copyrighted materials (discussed above) and my ongoing involvement with the HathiTrust Research Council suggested to me the enormous potential of systematizing analysis of oral argument. I also foresaw that actually engaging in my own text data mining research would give me a better understanding of the legal issues confronting this community. This vein of research will comprise a number of articles and we have also launched a website,, that we used to highlight our ongoing work.

In our first joint project in this area, The New Oral Argument: Justices as Advocates (forthcoming in 2019), we analyzed the text of every Supreme Court oral argument from 1960 to 2015 and demonstrated that judicial activity, as measured by words spoken or time speaking, increased significantly over that period. We established that almost all of this increase was attributable to judicial commentary and not to an increased number of questions. Furthermore, we found very strong support for our hypothesis that these changes were centered on the mid-1990s and can be plausibly ascribed to the dramatic increase in political polarization that took place at that time. We have several additional joint projects in process.