The Supreme Court is addressing the wrong question in #Aereo.

I will be speaking about the ongoing  Aereo litigation tomorrow at the Northwestern Journal of Technology and Intellectual Property’s Annual Symposium.

On April 22nd the Supreme Court will hear arguments as to whether a company “publicly performs” a copyrighted television program when it retransmits a broadcast of that program to thousands of paid subscribers over the Internet. My view is that this case should not be about public performance. The court in Cablevision got that one right, no other reading of the Copyright Act makes sense.

The Supreme Court should hold that a single aerial (or N aerials) that copy and transmit at an N:N ratio is not 1 performance to N people, it is N performances to N people and thus not “public”.

This does not mean Aereo is off the hook. Cablevision’s device is consistent with the Supreme Court’s Sony Betamax decision from 1984: copying made possible by a remote-dvr is  fair use. However, to the extent the Aereo system is designed to offer what is, in effect, live or almost live ‘rebroadcast’ beyond the authorized reception range of the original broadcast it may not be fair use. This is an open question, but the Supreme Court can’t decide it because it has not been briefed on the issue.

Whether a company “publicly performs” a copyrighted television program when it retransmits a broadcast is not the right question. Critics of Cablevision seem to think that if there is no public performance right for an R-DVR, then there is no tolling point at which creators get paid. But avoiding public performance does not avoid the initial broadcast or copying.

I have some slides that go into this in a bit more detail. Comments welcome.

Amici warn Supreme Court of the dangers of abstract software patents. #CLSbank

Jason Schultz, Brian Love, Jim Bessen and Mike Meurer have put together an excellent “Brief of Amici Curiae Law, Business, and Economics Scholars” in Alice Corp. v. CLS Bank, a case about to be argued before the US Supreme Court.

I signed this brief because I believe that the experience of the last 20 years shows that extending patent protection to abstract ideas and software functions does far more to impede innovation than it does to encourage it.

The U.S. Court of Appeals for the Federal Circuit has expanded the scope of patentable subject matter for abstract ideas over the last 20 years (see, In re Alappat, 33 F.3d 1526 (Fed. Cir. 1994) and State Street Bank & Trust Co. v. Signature Financial Group, Inc., 149 F.3d 1368 (Fed. Cir. 1998)). This expansion has lead to an explosion of software patenting and software patent litigation. Abstract patent claims award rights beyond the scope of actual invention, their boundaries are unclear, they don’t provide notice to third parties  and, for all these reasons, they invite opportunistic litigation.

Screen Shot 2014-02-27 at 2.08.35 PM

(See U.S. Gov’t Accountability Office, GAO-13-465, Intellectual Property: Assessing Factors That Affect Patent Infringement Litigation Could Help Improve Patent Quality 13 (2013), available at

The Supreme Court granted cert in this case to decide “Whether claims to computer-implemented inventions—including claims to systems and machines, processes, and items of manufacture—are directed to patent-eligible subject matter within the meaning of 35 U.S.C. § 101.” I expect the court to rule computer software is patent eligible, but that patent examiners should reject over-broad software patent claims on the basis of lack of patentable subject matter. As the divisions in the Federal Circuit’s en band decision show, it won’t be easy to develop a standard to determine whether a computer-implemented invention is a patent-ineligible, abstract idea.



Dish Network v. ABC Amicus Brief argues No #fairuse difference btw VCR & DVR

The Brief Amicus Curiae Of Intellectual Property Scholars in Dish Network L.L.C., v. American Broadcasting Companies, Inc., Et. Al has just been filed. The case is on appeal from the U.S. District Court for the Southern District of New York. Shubha Ghosh (U. Wisconsin Law School) wrote the brief and several IP academics signed it because we are concerned that

“ABC’s interpretation of copyright law would undermine longstanding fair use precedent. We urge the Court to reject ABC’s attempt to render Sony obsolete and re-litigate the public’s interest in making fair use copies with the aid of time-shifting technology.”

This case is  significant because it will affect the future of private noncommercial time-shifting of television programs – a fair use right expressly recognized by the Supreme Court in Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984). Our view, expressed in the brief is that the

“advancement of technology from the videotape recorder (“VTR”) to the videocassette recorder (“VCR”) considered in Sony, to today’s digital video recorder (“DVR”) and the technological enhancements of the DVR has not – nor should it – affect the scope of protection expressly recognized in Sony.”

Download the brief here: Paginated Brief Amicus Curiae of Intellectual Property Scholars filed 1_29_14 [D155]


Pornography and copyright trolling, some data.

“the Northern District of Illinois can be proud to be the pornography copyright trolling capital of the United States”


Compared to patent trolls, copyright trolls have received comparatively little attention. Admittedly, patent trolls may have broader economic significance, but copyright trolling raises its own unique set of issues that deserve to be addressed. Defining exactly what makes an individual or an organization a troll is inevitably controversial. As initially invoked, the term was meant to be a disparaging caricature. However, over the years, the term patent troll has come to mean something more than merely “someone whose lawsuit is inconvenient to me,” although it is certainly still invoked in that way from time to time.

The paradigmatic patent troll is a nonpracticing entity who asserts patent infringement against companies who have clearly not copied its technology and seeks either (a) to extract a series of nuisance value settlements or (b) to extract a large settlement from a deep pocket (eBay, Microsoft, etc) that is entirely out of proportion with the contribution on the patented technology. This latter approach has become significantly more difficult after the Supreme Court’s decision in eBay.

The nature of copyright trolling is a reflection of the economics on statutory damages and the principal technologies of infringement. The Electronic Frontier Foundation defines a copyright troll as a person or organization that files a copyright infringement lawsuit against as many defendants as possible for the purposes of extracting the most settlements with the least court costs. Generally these suits take the form of “Copyright Owner v. John Does 1 –1000” or some other large number. Not all of these suits are related to pornography, but a very large number of them are.

The theory behind these multi-party John Doe lawsuits is that every participant in a BitTorrent swarm is engaged in an act of copyright infringement and that each persistent is jointly liable for the resulting infringement. Courts and commentators have formed the distinct impression that such lawsuits are never intended to go to trial – they are simply a mechanism to compel Internet service companies to give the plaintiffs names and addresses to match the IP addresses that they already have. With this information in hand, the plaintiff can negotiate hundreds, even thousands of settlements. Reports indicate that $3000 is a typical settlement figure. This is a lot to pay for an adult movie, but it’s a small fraction of the potential statutory damages for willful copyright infringement which could be as high as $150,000 per work infringed. The threat of statutory damages, and the threat of exposure and embarrassment drive many settlements.

The Data

Just how widespread is this practice? I examined all copyright cases filed in the federal district courts associated with the Second, Seventh and Ninth circuits between January 1, 2001 and August 31, 2013. I identified “John Doe” lawsuits by looking for those words in the case title and I differentiated pornography copyright trolls from other plaintiffs by reviewing at least one underlying complaint per plaintiff. Figure 1, below, breaks the filing data down by state and into three year time periods based on the year of filing, beginning with the year 2001. This figure shows the prevalence of all “John Doe” actions as a percentage of all copyright filings. The figure highlights the recent growth of “John Doe” lawsuits and their uneven geographic concentration. It’s particularly noteworthy that in 2013 the suits make up the majority of filings in Illinois, Indiana, Washington and Wisconsin.

Figure 1: Percentage of John Doe Law Suits by State



Figure 2, below, is based on the same data except that it differentiates between pornography related John Doe litigation and other John Doe litigation.

Figure2: Percentage of John Doe (Pornography) Law Suits by State



Figure 3, below, is similar to figure 2 except that it focuses in on the last four years. The previous figures illustrate the prevalence of John Doe lawsuits as a percentage of all copyright lawsuits.

Figure 3: Percentage of John Doe (Pornography) Law Suits by State 2010-2013



Figure 4, below, shows the more numbers for John Doe pornography and other John Doe copyright litigation.

Figure 4: John Doe (Pornography) Law Suits by District 2001 – 2013

Count of porn by district

I have not examined the data from district courts beyond the second, seventh and ninth circuits, but for the moment it appears that the Northern District of Illinois can be proud to be the pornography copyright trolling capital of the United States.

United States Is versus United States Are

When Matthew Jockers, Jason Shultz and I were writing the Digital Humanities Amicus Briefs relating to the Google Books and HathiTrust cases, we searched for an illustration that would concisely explain why data mining expressive works was (a)  socially valuable and (b)  no threat to the copyright interests of the authors of the underlying works. We came across a graph produced using the Google n-gram tool that perfectly fit the bill. The graph below was part of the Digital Humanities Amicus Brief in both the HathiTrust and Google Books cases.



This graph is a reconstruction of data generated using Google Ngram, sampled at five-year intervals. The y-axis is scaled to 1/100,000 of a percent, such that 1 = 0.00001%.

The graph was referred to by the District Court in Authors Guild v. HathiTrust and last week’s decision in Authors Guild v. Google. As we explained in our brief, “[the figure] compares the frequency with which authors of texts in the Google Book Search database refer to the United States as a single entity (“is”) as opposed to a collection of individual states (“are”). As the chart illustrates, it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation. This is a trend with obvious political and historical significance, of interest to a wide range of scholars and even to the public at large. But this type of comparison is meaningful only to the extent that it uses as raw data a digitized archive of significant size and scope.”

Metadata like this can only be collected by digitizing the entire contents of books, and it clearly does not communicate any author’s original expression to the reading public.

I decided that the graph deserved its own post.

Google Books held to be fair use

Authors Guild v. Google: library digitization as fair use vindicated, again.

After more than eight years of litigation, the legality of the Google Books Search engine has finally been vindicated.

Screen Shot 2013-11-14 at 10.35.00 AM

Authors Guild v Google Summary Judgement (Nov. 14, 2013)

The heart of the decision

The key to understanding Authors Guild v. Google is not in the court’s explanation of any of the individual fair use factors — although there is a great deal here for copyright lawyers to mull over —  but rather in the court’s description of its overall assessment of how the statutory factors should be weighed together in light of the purposes of copyright law.

“In my view, Google Books provides significant public benefits. It advances the progress of the arts and sciences, while maintaining respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders. It has become an invaluable research tool that permits students, teachers, librarians, and others to more efficiently identify and locate books. It has given scholars the ability, for the first time, to conduct full-text searches of tens of millions of books. It preserves books, in particular out-of-print and old books that have been forgotten in the bowels of libraries, and it gives them new life. It facilitates access to books for print-disabled and remote or underserved populations. It generates new audiences and creates new sources of income for authors and publishers. Indeed, all society benefits.”  (Authors Guild v. Google, p.26)

Even before last year’s HathiTrust decision (Authors Guild v. Hathitrust), the case law on transformative use and market effect was stacked in Google’s favor. Nonetheless, Judge Chin’s rulings in other cases (e.g. WNET, THIRTEEN v. Aereo, Inc.) suggest that he takes the rights of copyright owners very seriously and that it was essential to persuade him that Google was not merely evading the rights of authors through clever legal or technological structures. The court’s conclusion that the Google Library Project “advance[d] the progress of the arts and sciences, while maintaining respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders” pervades all of its more specific analysis.

Data mining, text mining and digital humanities

An entire page of the judgment is devoted to explaining how digitization enables data mining. This discussion relies substantially on the Amicus Brief brief of Digital Humanities and Law Scholars signed by over 100 academics last year.

“Second, in addition to being an important reference tool, Google Books greatly promotes a type of research referred to as “data mining” or “text mining.”  (Br. of Digital Humanities and Law Scholars as Amici Curiae at 1 (Doc. No. 1052)).  Google Books permits humanities scholars to analyze massive amounts of data — the literary record created by a collection of tens of millions of books.  Researchers can examine word frequencies, syntactic patterns, and thematic markers to consider how literary style has changed over time.  …

Using Google Books, for example, researchers can track the frequency of references to the United States as a single entity (“the United States is”) versus references to the United States in the plural (“the United States are”) and how that usage has changed over time.  (Id. at 7).  The ability to determine how often different words or phrases appear in books at different times “can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology.”  Jean-Baptiste Michel et al., Quantitative Analysis of Culture Using Millions of Digitized Books, 331 Science 176, 176 (2011) (Clancy Decl. Ex. H)” (Authors Guild v. Google, p.9-10)

The court held that Google Books was “[transformative] in the sense that it has transformed books text into data for purposes of substandard research, including data mining and text mining in new areas, thereby opening up new fields of research. Words in books are being used in a way they have not been used before. Google books has created something new in the use of text — the frequency of words and trends in the usage provide substantial information.”

A snippet of new law

Last year, the court in HathiTrust ruled that library digitization for the non-expressive use of text mining and the expressive use of providing access to the visually disabled was fair use. Today’s decision in Authors Guild v. Google supports both of those conclusions; it further holds that the use of snippets of text in search results is also fair use. The court noted that  displaying snippets of text as search results is similar to the display of thumbnail images of photographs as search results and that these snippets may help users locate books and determine whether they may be of interest.

The judgment clarifies something that confuses a lot of people — the difference between “snippet” views on Google books and more extensive document previews. Google has scanned over 20 million library books to create its search engine, mostly without permission. However, Google has agreements with thousands of publishers and authors who authorize it to make far more extensive displays of their works – presumably because these authors and publishers understand that even greater exposure on Google Books will further drive sales.

The court was not convinced that Google Books poses any threat of expressive substitution because, although it is a powerful tool for learning about books individually and collectively, “it is not a tool to be used to read books.”

The Authors Guild had attempted to show that an accumulation of individual snippets could substitute for books, but the court found otherwise: the kind of accumulation of snippets that the plaintiffs were suggesting was both technically infeasible because of certain security measures and, perhaps more importantly, was bizarre and unlikely: “Nor is it likely that someone would take the time and energy to input countless searches to try and get enough snippets to comprise an entire book.  Not only is that not possible as certain pages and snippets are blacklisted, the individual would have to have a copy of the book in his possession already to be able to piece the different snippets together in coherent fashion.”


Today’s decision is an important victory for Google and the entire United States technology sector; it also confirms the recent victory libraries, academics and the visually disabled in Authors Guild v. HathiTrust.

Unless today’s decision is overruled by the Second Circuit or the Supreme Court — something I personally think is very unlikely –, it is now absolutely clear that technical acts of reproduction that facilitate purely non-expressive uses of copyrighted works such as books, manuscripts and webpages do not infringe United States copyright law. This means that copy-reliant technologies including plagiarism detection software, caching, search engines and data mining more generally now stand on solid legal ground in the United States. Copyright law in the majority of other nations does not provide the same kind of flexibility for new technology.

All in all, an excellent result.

* Updated at 4.57pm. The initial draft of this post contained several dictation errors which I will now endeavor to correct. My apologies. Updated at 5.17pm with additional links and minor edits. 





University of Iowa presentation on copyright, mass digitization and the digital humanities

I am giving a talk today on copyright, mass digitization and the digital humanities at the University of Iowa law school. My talk will focus on the ongoing litigation between the Authors Guild and Google and the separate case of Authors Guild v. HathiTrust. The case against Google began in 2005 shortly after Google launched its ambitious library digitization project. The case against the HathiTrust, a digital library that pulls together the resources of a number of American universities, began much later in September 2011.

These cases raise complicated issues about standing, the scope of class actions, statutory interpretation, the interaction of general and specific limitations and exceptions to copyright under the Copyright Act of 1976, and probably a few others besides. However, at the heart of both cases is actually a very simple question — does copying for non-expressive use require the express approval of the copyright owner?

A non-expressive uses one which involve some technical act of copying the above for which the resultant copy is not read by any human being. For example, checking work for plagiarism involves comparing the suspect work against a database of potential sources. It is certainly valuable to know that work A is suspiciously like work B, but that knowledge is entirely independent of the expressive value of either of the underlying works.

Non-expressive use was not a particularly pressing concern before the digital era – from the printing press to the photocopier, the only plausible reason to copy a work was in anticipation on reading it. In the present however, scanning technology, computer processing power and powerful software tools make it possible to crunch the numbers on the written word in all sorts of remarkable ways. The non-expressive use that most people will be familiar with relates to Internet search engines. Search engines direct users to sites of interest based on a complicated set of algorithms, but underlying those algorithms is an extraordinary database describing the contents of billions of individual webpages. To build a database requires copying and indexing billions of individual webpages.

Authors Guild v. Google will determine whether it was legitimate for Google to extend its Internet search model to the off-line world and apply it to paper-based works which had never been digitized. However, the significance of this cases goes well beyond building a better library catalog — although the importance of that should not be casually dismissed — Authors Guild v. Google and Authors Guild v. HathiTrust will shape the future of the digital humanities. If the District Court ruling in HathiTrust stands, as I believe it should, academics who wish to combine data science and a love of literature will not be shackled to the pre-1923 public domain. They will be able to apply the same analytical techniques to the works of William Faulkner as to those of William Shakespeare. More importantly, distant reading empowered by computational analysis will allow scholars to extend their gaze beyond a narrow literary canon or even the few thousand works for most of us can hope to read in our lifetime and address questions on a broader scale.

Slides are available here: Copyright and Mass Digitization, Iowa 2013


Some thoughts on the use of bio photos

I have noticed over the years that whenever someone puts together a bio page for me in relation to a talk or a conference presentation that they tend to grab just any old photo from the Internet. Quite frankly, some of these photographs a more flattering than others. Most of them are not as good as the selfy I took on my iPhone this morning.  Photos from 10 years ago might be considered too flattering in terms of hairline.

Perhaps with some strategic tagging and linking I can get this to the top of the Google search engine.

Photo of Prof. Matthew Sag 2013

Matthew Sag


I also have a full bio page at which contains all sorts of useful information.

Archives & Copyright: Developing An Agenda For Reform starts tomorrow #dh #archivescopyright

Archives & Copyright: Developing An Agenda For Reform

This is a one day symposium, co-organised by CREATe and the Wellcome Library. The symposium considers forthcoming changes to the copyright regime in the UK as it impacts the work of archives, as well as the role that risk-management plays in copyright compliance for archival digitization projects.

I will be speaking on a panel along with Professors Peter Jaszi and Peter Hirtle. We will discuss how cultural heritage institutions in the US work with copyright law, and in particular the ongoing Authors Guild v. HathiTrust case (currently on appeal).

I plan to talk about my experience bringing together (along with Jason Schultz and Matthew Jockers) the digital humanities amicus briefs for Authors Guild v. Hathi Trust I and II and Authors Guild v. Google. My slides are available right here.

The #hashtag for the symposium is #archivescopyright