Today I will be presenting on “The Missing Theory of Transformative Use” at the Intellectual Property Scholars Conference at DePaul University in Chicago. My presentation is basically a distillation of the first three chapters of a book I am writing on the modern law of fair use.
NAFTA must include fair use commitments
I joined with over seventy international copyright law experts today in calling for NAFTA and other trade negotiators to support a set of balanced copyright principles.
Policies like fair use, online safe harbors, and other exceptions and limitations to copyright permit and encourage access to knowledge, flourishing creativity, and innovation.
The following copyright principles are essential to ensure consumers’ digital rights. Copyright law should:
- Protect and promote copyright balance, including fair use
- Provide technology-enabling exceptions, such as for search engines and text- and data-mining
- Include safe harbor provisions to protect online platforms from users’ infringement
- Ensure legitimate exceptions for anti-circumvention, such as documentary filmmaking, cybersecurity research, and allowing assistive reading technologies for the blind
- Adhere to existing multilateral commitments on copyright term
- Guarantee proportionality and due process in copyright enforcement
Measuring the value of copyright and the value of copyright exceptions is methodologically challenging, but if we use the same criteria that WIPO adopts to estimate the value of copyright, then in the U.S., fair use industries represent 16% of annual GDP and employ 18 million American workers.
The Washington Principles on Copyright Balance in Trade Agreements and the new research on Measuring the Impact of Copyright Balance are located at http://infojustice.org/flexible-use
Text Mining, Non-Expressive Use and the Technological Advantage of Fair Use
On March 29, 2017, I attended a fantastic conference on “Globalizing Fair Use: Exploring the Diffusion of General, Open and Flexible Exceptions in Copyright Law” hosted by American University Washington College of Law’s Program and Information Justice and Intellectual Property. As part of that event we held a webcast Q&A session moderated by Sasha Moss of the R Street Institute. The following is rough transcript of my comments in response to Sasha’s questions about the legality of the non-expressive use copyrighted works.
Copyright Questions For the Digital Age
There is no country in the world where simply reading a book and giving someone information about the book, such its subject or themes, whether it uses particular words or particular combinations of words, the number of words, the number of pages, the ratio of female to male pronouns, etc., would amount to copyright infringement.
Why? Because information about the book is not the book. It is metadata. The question for the digital age is, “Can we use computers to produce that kind of data?” This question is important because although I can read a few books and produce some useful metadata, I can’t read a million books. But a computer can.
We have the technology
We have the technology to digitize large collections of books in order to produce data that enables computer scientists, linguists, historians, English professors, and the like, to answer important research questions. The data and the questions it can be used to answer do nothing to communicate the original expression of all those millions of books. However, technically speaking, this kind of digitalization is still copying.
But is this the kind of copying that copyright law should be concerned about? If a tree falls in an empty forest, does it truly make a sound? If something is copied but only read by a computer and the computer only communicates metadata about the work, is that the kind of copying this should amount to copyright infringement?
Text mining is vital for machine learning, automatic translation, and developing the language models
It seems to me, that once you phrase the question that way the answer is clear. We all use this amazing technology on a daily basis when we rely on Internet search engines, but text mining use is about much more than this. By data mining vast quantities of scientific papers, researchers have been able to identify new treatments for diseases. Text mining has also allowed humanities scholars to identify patterns in vast libraries of literature. Text mining is vital for machine learning, automatic translation, and developing the language models the power dictation software.
Fair use and technological advantage
The United States is a world leader in various applications of text mining, starting with Internet search, but going far beyond that. In the United States, once people realized what was possible they more or less start doing it. If Larry Page and Sergy Brin had had the idea for the Google Internet search engine in Canada, Australia, England, or Germany in the 1990s it would have been crystal-clear that because their search engine relied on making copies of other people’s HTML webpages and there was no realistic way to obtain permission from all those people, building search engine would be illegal. In countries with a closed list of copyright exceptions and limitations, or with fair dealing provisions that are tied to specific narrowly defined purposes, a lawyer would have looked at the list and said, “I don’t see Internet search or data mining on that list, so you can’t do it.”
The fair use doctrine reinforces copyright rather than negating it
In the United States, we have the fair use doctrine, which means that the list is not closed. In the United States, the fair use doctrine means you at least get a chance to explain why your particular use of a copyrighted work is for a purpose that promotes the goals of copyright, is reasonable in light of that purpose, and is unlikely to harm the interests of copyright owners. The fair use doctrine reinforces copyright rather than negating it; fair use doesn’t mean that you get to do whatever you want. Fair use is a system for determining how copyright should apply in new situations. That is especially important whether the law was written decades ago and society and technology are changing fast.
Without something like fair use, other countries can only follow the United States
Without something like fair use, other countries can only follow the United States. Non-expressive uses of copyrighted works such as text mining, building an Internet search engine, or running plagiarism detection software have all been held to be fair use in the United States and are slowly becoming more accepted around the world. Of course, now that it is readily apparent that these activities are immensely beneficial and entirely non-prejudicial to the interests of copyright owners we could probably write some specific amendments to the copyright act to make them legal. The problem is, do we didn’t know this two decades ago when we actually needed those rules. I don’t know what the next thing that we don’t know is, but I do know that experience has shown that the flexibility of the fair use doctrine—which has been part of copyright law virtually since the English Statute of Anne in 1710, by the way—has worked better than a system of closed lists.
The fair use doctrine is a real source of competitive advantage for technologists and academic researchers in the United States. Right now, there are technologies being developed and research being done in the United States that either can’t be done in other countries, or can only be done by particular people subject to various arbitrary restrictions. Whether it’s Internet search, digital humanities research, machine learning or cloud computing, other countries have followed the United States in adopting technologies that make non-expressive use of copyrighted works, because some of the copyright risks begin to look less daunting once the practice has become accepted. The Europeans, for example, are pretty sure building a search engine must be legal, but they can’t quite agree why. But the thing to understand is that you can follow this way but you can never lead. It’s much harder to do the new thing if by the letter of the law it is illegal and you have no forum to argue that it should be allowed.
The future doesn’t have a lobby group
Of course, that’s not quite true, you have one forum … you can spend a vast amount of money are lobbyists and go to the government, go to Congress and try to get some favorable rules written. But even if that is successful from time to time, those rules have a particular character. A company that spends millions of dollars on a lobbying campaign to change the law is always going to try and make sure that those new rules only benefit its business. Special interests will get some laws changed, but usually in ways that disadvantage their competitors or exclude alternative technologies that might one day compete with them. The fundamental problem with relying on static lists of copyright exceptions and lobbying to get those lists revised as needed is that the future doesn’t have a lobby group.
If you would like to read more about these topics:
- Matthew Jockers, Matthew Sag & Jason Schultz, Brief of Digital Humanities and Law Scholars in Support of Defendants-Appellees and Affirmance in Authors Guild v. Google (13-4829) (July 10, 2014)
- Matthew Jockers, Matthew Sag & Jason Schultz, Digital Archives: Don’t Let Copyright Block Data Mining, 490 Nature 29-30 (October 4, 2012)
- Matthew Sag, Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal 1503 – 1550 (2012)
- Matthew Sag, Copyright and Copy-Reliant Technology, 103 Northwestern University Law Review 1607–1682 (2009)
Internet Safe Harbors and the Transformation of Copyright Law will be published in the Notre Dame Law Review
My article, Internet Safe Harbors and the Transformation of Copyright Law, will be published in the Notre Dame Law Review, Vol. 93, 2017, later this year.
This Article shows how the substantive balance of copyright law has been overshadowed online by the system of intermediary safe harbors enacted as part of the Digital Millennium Copyright Act (“DMCA”) in 1998. The Internet safe harbors and the system of notice-and-takedown fundamentally changed the incentives of platforms, users, and rightsholders in relation to claims of copyright infringement. These different incentives interact to yield a functional balance of copyright online that diverges markedly from the experience of copyright law in traditional media environments. This article also explores a second divergence: the DMCA’s safe harbor system is being superseded by private agreements between rightsholders and large commercial Internet platforms made in the shadow of those safe harbors. These agreements relate to automatic copyright filtering systems, such as YouTube’s Content ID, that not only return platforms to their gatekeeping role, but encode that role in algorithms and software.
The normative implications of these developments are contestable. Fair use and other axioms of copyright law still nominally apply online; but in practice, the safe harbors and private agreements made in the shadow of those safe harbors are now far more important determinants of online behavior than whether that conduct is, or is not, substantively in compliance with copyright law. The diminished relevance of substantive copyright law to online expression has benefits and costs that appear fundamentally incommensurable. Compared to the offline world, online platforms are typically more permissive of infringement, and more open to new and unexpected speech and new forms of cultural participation. However, speech on these platforms is also more vulnerable to over-reaching claims by rightsholders. There is no easy metric for comparing the value of non-infringing expression enabled by the safe harbors to that which has been unjustifiably suppressed by misuse of the notice-and-takedown system. Likewise, the harm that copyright infringement does to rightsholders is not easy to calculate, nor is it easy to weigh against the many benefits of the safe harbors.
DMCA-plus agreements raise additional considerations. Automatic copyright enforcement systems have obvious advantages for both platforms and rightsholders; they may also allow platforms to be more hospitable to certain types of user content. However, automated enforcement systems may also place an undue burden on fair use and other forms of non-infringing speech. The design of copyright enforcement robots encodes a series of policy choices made by platforms and rightsholders and, as a result, subjects online speech and cultural participation to a new layer of private ordering and private control. In the future, private interests, not public policy will determine the conditions under which users get to participate in online platforms that adopt these systems. In a world where communication and expression is policed by copyright robots, the substantive content of copyright law matters only to the extent that those with power decide that it should matter.
Prenda is gone, but copyright trolling continues
A pattern of “brazen misconduct and relentless fraud”
Like many, I took great satisfaction from reading that John L Steele had plead guilty and acknowledged his role in the “copyright trolling” scheme that took in millions of dollars in settlements from 2010 to 2012.
For a short while, the lawyers at Prenda–Paul Duffy,John L. Steele and Paul R. Hansmeier–were the public face of copyright trolling. According to the courts, Duffy, Steele and Hansmeier engaged in “vexatious litigation designed to coerce settlement” in a pattern of “brazen misconduct and relentless fraud.” They lied to the courts, forged documents, practiced identity theft, placed their own content online so that they could sue people for stealing it, and generally behaved badly.
The scheme worked as follows: lawyers would file copyright suits alleging that some unknown person (John Doe) identified only by their IP address had infringed copyright by using BitTorrent, an online file sharing protocol. The laywers would file a case against “John Does 1- 1000” and pursuade the court to let them subpoena ISPs to get the subscriber details that matched those IP addresses. They would threaten those newly unmasked John Does and demand payment to drop the suit. Otherwise, Does alleged pornography viewing habits would be exposed to the world and he would probably end up paying tens of thousands in statutory damages.
Prenda is gone, but copyright trolling continues
Duffy died in 2015. Steele and Hansmeier were placed under federal indictment in December 2016 for “an elaborate scheme to fraudulently obtain millions of dollars in copyright lawsuit settlements by deceiving state and federal courts throughout the country.” And now that Steele has plead guilty to those charges, Hansmeier’s conviction seems like a certainty.
It is satisfying to see justice finally catch up with Steele and Hansmeier, but anyone who thinks that this is the end of copyright trolling has not been paying attention. In fact, other than a brief hiccup in early 2016, the filing of lawsuits designed to extract settlements from alleged online pirates has only increased since Prenda went out of business.
As my co-author, Jake Haskell, and I will show in a paper to be made public next week (we are proofreading right now), in the post-Prenda era, lawsuits filed against John Doe defendant made up more than 52% of all copyright cases in in the United States in 2014 and 58% in 2015. The number of suits dropped slightly after Malibu Media lost a case on summary judgment in January 2016, but the rate of filing is increasing again. Even so, between 2014 and 2016 copyright trolling accounted for 49.8% of the federal copyright docket.
Our analysis of the federal court filing records indicates that in 2016, the average number of defendants in each of the John Doe cases was 4.7 on a conservative estimate . In other words, although there were 1,362 John Doe copyright cases filed last year, 6,483 individual defendants were targeted. Without doubt, some of those people were illegally downloading movies, but a great many were not.
The new breed of plaintiffs who filled Prenda’s shoes are different to Prenda, but not different enough. The plaintiffs’ claims of infringement still rely on poorly substantiated form pleading and are targeted indiscriminately at non-infringers as well as infringers. Plaintiffs have realized that there is no need to invest in a case that could actually be proven in court, or in forensic systems that reliably identify infringement without a large ratio of false positives. Their lawsuits are filed primarily to generate a list of targets for collection; and are unlikely-in our view-to withstand the scrutiny of contested litigation.
The human cost of copyright trolling is significant. It is true that sometimes the plaintiffs get lucky and target an actual infringer who is motivated to settle. But even when the infringement has not occurred or where the infringer has been misidentified, some combination of the threat of statutory damages of up to $150,000 for a single download, tough talk, and technological doublespeak are usually enough to intimidate even innocent defendants into settling.
In our paper, titled “Defense Against the Dark Arts of Copyright Trolling” (available on ssrn.com next week, if all goes to plan), we undertake a detailed analysis of the legal and factual underpinnings of these online file sharing cases against John Doe defendants. We analyze the weaknesses of the typical plaintiff’s case and integrate that analysis into a comprehensive strategy roadmap for defense lawyers and pro se defendants. In short, as our title suggests, we aim provide a comprehensive and useful guide to the defense against the dark arts of copyright trolling.
Copyright Trolling in Chicago (17980 IP addresses and counting)
WBEZ ran a story on Thursday, based in part on my research first published in the Iowa Law Review. The story, Why Are So Many People In Northern Illinois Being Sued For Downloading Porn? by Miles Bryan is an excellent overview of a complicated topic.
Focus on Chicago
Although these John Doe lawsuits are a nation-wide phenomenon, Chicago (technically, the Northern District of Illinois) is the leading destination for what many people regard as ‘copyright trolling’. The Northern District of Illinois has accounted for roughly 15% of all copyright John Doe lawsuits nationwide since 2013.
The Northern District of Illinois covers 18 counties across the northern tier of Illinois, with a population of about nine million people. The Southern District of New York which encompasses New York City and the Southern District of California which includes Los Angeles are much larger in terms of population, yet the SDNY has only had 531 John Doe cases in the same period that Chicago has seen 1603. The Southern District of California has seen a mere 165.
Since 2010 (up until June 2016) lawyers in the greater Chicago area (technically the Northern District of Illinois) have filed over 1600 John Doe copyright cases (1603 at last count). This practice is now so common in Chicago that these suits outnumber regular copyright lawsuits by a ratio of more than 4 to 1 (there were 385 regular copyright suits in the same period.)
Because of the way these suits are filed, one lawsuit can sweep in a large number of IP addresses. Based on court records, my conservative estimate of the number of IP addresses involved in one of these suits in the Northern District of Illinois since 2010 is 17,980. Not all of these cases involve pornography, but the vast majority do, 73% in the Northern District of Illinois.
In 2015 alone, Chicago court saw just 48 regular copyright lawsuits filed, and 395 John Doe copyright lawsuits.
John Doe copyright lawsuits accounted for 58% of all copyright cases filed in 2015
Across the entire country, John Doe copyright lawsuits have risen from just under 4% of all copyright filings in 2010 to more than 19% in 2011, 43% in 2012, 46% in 2013, 51% in 2014 and just under 58% in 2015.
One pornography company, Malibu Media accounted for 40% of all federal copyright cases filed in 2014 and 2015. However, data collected for the first four months of 2016 shows that Malibu Media’s influence is declining (it accounts for only a quarter of all federal copyright cases filed in 2016 so far) and that there may be fewer John Doe cases filed this year if current trends continue. Last year there were 2930 cases filed, so far this year there have been only 690. John Doe cases for the year to date account for only 39.5% of all federal copyright cases.
Why Chicago?
One of the questions that Miles asked me to think about is why this phenomenon is so prevalent in Chicago?
The first thing to note is that Chicago is not alone. New Jersey actually had more of these cases in 2015 and the Southern District of New York had only slightly less. The five leading federal districts for john doe copyright cases in 2015 were
- New Jersey – 386
- Illinois (ND) – 395
- New York (SD) – 248
- Maryland – 194
- Virginia (ED) – 153
But the Chicago cases involved many more IP addresses (almost 10 times as many!) and thus effected many more people.
Part of the answer to the question of why Chicago is that Chicago is large metro area with a lot of potential targets, so the economies of scale make it attractive to set up shop here. But that does not fully explain it. I think that another import part of the story is that judges in Chicago have not been as hostile to these suits as some judges in New York and Los Angeles.
Judges in the Northern District of Illinois are not exactly thrilled about john doe litigation, however, they has not closed the door to this kind of litigation and they are more tolerant of joining large numbers of IP addresses in a single lawsuit.
Related Publications:
Matthew Sag, IP Litigation in US District Courts: 1994 to 2014, 101 Iowa Law Review 1065-1112 (2016) (download from ssrn) Data updated for 2015 (http://ssrn.com/abstract=2711326)
Matthew Sag, Copyright Trolling, An Empirical Study, 100 Iowa Law Review 1105-1146 (2015) (download from ssrn)
Loyola is hosting the Society for Economic Research on Copyright Issues Annual Meeting Today
The SERCI Annual Congress 2016 is being held at Loyola University Chicago School of Law, Chicago, 7-8th July and is co-hosted by University of Illinois College of Law.
The Society for Economic Research on Copyright Issues or SERCI was established in 2001 to provide a solid academic platform for the application of economic theory to copyright policy.
The complete program is posted online at http://www.serci.org/congress.htm.
My slides for my presentation on empirical studies of copyright litigation are available here.
Getty’s high-resolution competition law complaint against Google in the EU
Getty Images has filed a competition law complaint against Google
- See Getty’s press release
- Getty Images files antitrust charges against Google over image scraping – Ars Technica 20160427 http://arstechnica.com/tech-policy/2016/04/google-eu-antitrust-getty-images-complaint/
- EU hits Google with second antitrust charge – reuters.com 20160420 http://www.reuters.com/article/us-eu-google-antitrust-idUSKCN0XH0VX
I have some initial thoughts as follows:
Image Search vs. High-Resolution Image Search
Google’s rationale for image search in general is that displaying the image is necessary for the user to assess how well the image corresponds to their search. This practice has been litigated at least twice in the U.S. in relation to thumbnail images and has easily passed the test of fair use.
Getty’s complaint is directed more specifically to the creation of high-resolution galleries. Although Google could make a similar argument that you need to see the image in high resolution to properly evaluate it, my view is that this argument is not nearly so compelling. The high resolution display is more expressive and less informational and the potential adverse effect on the copyright owner is greater for that reason and because, as Getty points out once consumers see an image on the Web, they aren’t likely to go to the source and look at it again. Getty argues that Google Images’ creation of high-res galleries of copyrighted content is thus impacting Getty’s own image licensing business; promoting piracy and copyright infringement; and bolstering Google’s monopoly over site traffic, engagement data and ad spend.
[Update: It is worth noting that other design changes may have reduced the flow on traffic from Google Image search see this 2013 Search Engine Land story]
Basically, Getty is worried that it is far too easy to right-click and copy image from Google Image Search and that as a result people won’t click down to Getty’s now site.
Illustration: Right-Click Options On Getty’s Own Press Release
Is there a copyright case?
Getty is an American company complaining about another American company’s treatment of its copyrighted properties and so it is less than obvious that the complaint would be dealt with in the EU under antitrust law rather than in the US under copyright law. I would need to understand the technology behind the high resolution image display to say whether Getty has a strong case for copyright infringement under U.S. law, but I think there are clear differences between the fair use status of low-resolution thumbnails and the high resolution images available on Google image search today.
Is there a competition law case?
It seems to me that providing high resolution galleries makes it at least marginally less likely that users will click through to the original site. This in turn makes it marginally more likely that users will copy and paste without authorization. But I would note that Google is under no obligation to design its information services in a way that drives traffic to a particular website, or external websites in general. The design of Google image search would make a poor antitrust case in the US but it might go further in the EU because they take a broader view of “abuse of dominance”. The main problem I see is that even if Google image search increases the unauthorized use of images, it probably does not affect the market for licensed images. I think that people looking to license stock footage will go to stock photo sites, but this is an empirical question so we would need to look at how the market actually works.
Getty Images says that, when it first raised concerns about this with Google, it was told to accept Google’s presenting of images in high-res format or opt out of image search. Since its founding, Google has relied on the fact that people can opt out of search to address complaints about the way search is run. Participation in Google image search is voluntary – on an opt out basis – but that does not give Getty, or anyone else, the right to say exactly how they would like the search engine to run. In 2013 Google agreed with the FTC to make changes to the way it scrapes the content of rivals like Yelp. Sites like Yelp can now opt out of having their content scraped without opting out of search entirely. This is probably what Getty is looking for. But the issue for Getty is that it does not compete with Google they way Yelp does, so their case is not as strong.
It’s hard to say how might EU regulators will view Getty’s complaint. It depends on some facts that we don’t have access to at the moment — I have only seen the press release, not the complaint itself. It also depends on whether the EU case against Google has more to do with politics and protectionism than it does with the merits of competition law. If Google settles with the European commission, the settlement might include modifications to the way image search works and that would be a significant win for Getty. From Google’s point of view the worst case scenario is that it would be forced to make a change to image display within the EU. I don’t think Google will change image search in the US unless it thought Getty had a case under American copyright or antitrust law.
CREATIVE DIGITAL ARTS — Event Announcement
CREATIVE DIGITAL ARTS FREE PUBLIC EVENT
Some thoughts on Malibu Media’s recent loss and its implications
(Malibu Media LLC v. Doe, Docket No. 1:13-cv-06312 (N.D. Ill. Sept 04, 2013)
Malibu Media’s case against yet another John Doe defendant was tossed out of court on February 8th by United States Magistrate Judge Geraldine Soat Brown.
Malibu v. Doe, Memorandum, Opinion and Order of Feb 8, 2016
The defendant in this case prevailed in summary judgement because Malibu was unable to establish that he had ever used Bittorrent or that it’s films had ended up on his hard drive. Malibu had been relying on experts from its technology vendor, but it failed to follow the rules with respect to follow the rules on disclosure of expert witnesses. (See, Fed. R. Civ. P. 26(a)(2)). Malibu also tried to add vital paragraphs containing new opinions to another witness’ original declaration in a manner not permitted by the Federal Rules of Civil Procedure.
On the surface, the loss does not appear to have broad implications for Malibu Media’s campaign against illegal file sharing; after all, it should not be too hard to avoid these particular procedural slip-ups in the future.
Maybe, maybe not?
Malibu is engaged in a litigation campaign of unprecedented scope — last year Malibu Media alone was responsible for 39% of all copyright litigation in the US. (See Matthew Sag, IP Litigation in United States District Courts—2015 Update (January 5, 2016). Available at SSRN: http://ssrn.com/abstract=2711326.) John Doe litigation, by Malibu Media and others, made up almost 58% of the federal copyright docket (2930 cases out of 5076) in 2015. Malibu’s recent loss in the Northern District of Illinois illustrates, yet again, how ill suited federal court litigation is to resolving what should be relatively low stakes copyright disputes.
It is time for an entirely new forum to deal with the routine infringements that occur on BitTorrent and similar networks. The Copyright Office has suggested a small claims court for copyright but we probably need something far more targeted.

