Matthew Sag

September 16, 2014

Kienitz v. Sconnie Nation — transformative uses and derivative works. #Fairuse

Some additional thoughts on the 7th Circuit’s decision in Kienitz v. Sconnie Nation LLC, No. 13-3004 (7th Cir. Sept. 15, 2014).

Judge Easterbrook expressed some skepticism today over the Second Circuit’s decision in Cariou v. Prince, 714 F. 3d 694 (2d Cir. 2013) because …

asking exclusively whether something is “transformative” not only replaces the list in §107 but also could override 17 U.S.C. §106(2), which protects derivative works. To say that a new use transforms the work is precisely to say that it is derivative and thus, one might suppose, protected under §106(2).

Easterbrook complains that

Cariou and its predecessors in the Second Circuit do not explain how every “transformative use” can be “fair use” without extinguishing the author’s rights under §106(2).”

Ok, so let me explain.

First, Cariou and its predecessors don’t say that every transformative use is fair use. Second, more importantly, transformative use and derivative work are both important terms of art in copyright law. They are not the same thing. Nobody thinks they are.

Section 106(2) of the Copyright Act gives copyright owners an exclusive right to prepare derivative works based on the copyright owner’s original work. As defined in the statute, a derivative work takes a preexisting work and “recasts, transforms, or adapts” that work. The kind of transformations referred to here are not necessarily ‘transformative’ as that term was intended by the Supreme Court in the context of fair use. And yes, obviously, using a word that is not a stem of ‘transform’ would have helped.

A transformative work, in the fair use sense, is one which imbues the original “with a further purpose or different character, altering the first with new expression, meaning, or message.” [Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 579 (1994) (internal citations omitted).] Thus, the assessment of transformativeness is not merely a question of the degree of difference between two works; rather it requires a judgment of the motivation and meaning of those differences.

The difference between a non-infringing transformative use and an infringing derivative work can be illustrated as follows: if Pride and Prejudice were still subject to copyright protection, the novel Pride and Prejudice and Zombies, which combines Austen’s original work with scenes involving zombies, cannibalism, and ninjas, would be considered a transformative parody of the original, and thus fair use rather than infringement. In contrast, a more traditional sequel would merely be an infringing derivative work.

The term transformative use has been applied to cases of literal transformation where it overlaps with the kinds of manipulations that might also create a derivative work. Thus in Suntrust Bank v. Houghton Mifflin Co, substantial copying of a novel in the service of criticism was regarded as transformative.

The term transformative use has been applied to cases of copying without modification, but for a good reason. For example in Savage v. Council on American-Islamic Relations, Inc., the Islamic organization copied and distributed anti-Islamic statements made by Michael Savage as part of a fund-raising exercise. Recontextualization without modification from one expressive context to another was seen as transformative Bill Graham Archives v. Dorling Kindersley Ltd.

In addition to these cases, courts have also found a number of non-expressive uses to be transformative. In particular, several cases have held that automated processing and display of copyrighted photos as part of a visual search engine is a transformative and thus a fair use. In A.V. v. iParadigms, LLC, the Fourth Circuit found that the automated processing of the plaintiff students’ work in defendant’s plagiarism detection software was fair use). More recently, Authors Guild v. HathiTrust (SDNY), Authors Guild v. HathiTrust (2d Cir) and Authors Guild v. Google (SDNY) held that library digitization to create a search engine was transformative use and fair use.

Maybe we would be better off with different words for all these situations. David Nimmer suggests that in the hands of some judges, transformative use has no content at all and that it is simply synonymous with a finding of fair use. According to Pamela Samuelson, a better approach would be to distinguish transformative critiques, such as parodies, from productive uses for critical commentary. Samuelson also suggests that courts should not label orthogonal uses—uses wholly unrelated to the use made or envisaged by the original author—as transformative uses. But she does think that these are good candidates for fair use.

My personal preference would be for the term transformative use to be confined to expressive uses of copyrighted works and that non-expressive use (as exemplified by search engines, plagiarism detection software, text mining, etc) should be recognized as a distinct category of preferred use. Nonetheless, transformative use is the term of art most courts use and we should probably learn to live with it.

Even when Judge Easterbrook is right, he is wrong. #fairuse #copyright #blaaah

I have been wanting to blog about the 7th Circuit’s appalling decision in Kienitz v. Sconnie Nation LLC, No. 13-3004 (7^th Cir. Sept. 15, 2014) since I read it — exactly ~~seven~~ twenty minutes ago. However, ~~two~~ fifteen minutes ago I discovered that Prof. Rebecca Tushnet (Georgetown Law) has already said most of what I wanted to say.

The case is about the transformative use of a photo. The case for transformation is pretty easy here because there is both substantive transformation (see below) and an obvious shift in purpose in that the original photo is a PR shot of politician opposed to a street party and the new use is a caricature of the same politician on tee-shirts and tank tops.

The court of appeals took this easy case as an opportunity to try to unsettle the law of fair use by casting stones at the concept of transformativeness. The court notes that transformativeness doesn’t appear in the statute, and says it was “mentioned” it in Campbell. What the Supreme Court actually said in Campbell was “The central purpose of this investigation is to see whether the new work merely supersedes the objects of the original creation, or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message; it asks, in other words, whether and to what extent the new work is ‘transformative.'” Campbell v. Acuff-Rose Music, Inc., 510 US 569 (1994). (internal citations and quotations omitted)

This is a bit more than a mention.

Now I’ll just quote Rebecca:

…Having not quoted either the Supreme Court or the Second Circuit’s definition of transformativeness (which might allow one to assess whether there is too great an overlap with the derivative works right, or for that matter with the reproduction right since that’s what the majority of Second Circuit transformativeness findings deal with), the Seventh Circuit tells us to stick to the statute. But it doesn’t tell us what the first factor does attempt to privilege and deprivilege. Instead, the court goes to its own economic lingo-driven test: “whether the contested use is a complement to the protected work (allowed) rather than a substitute for it (prohibited).” Where this appears in the statute is left as an exercise for the reader, though by placement in the opinion we might possibly infer that it is the appropriate rephrasing of factor one, as opposed to inappropriate transformativeness (though the court later says that factor one isn’t relevant at all). However, complement/substitute requires some baseline for understanding the appropriate scope of the copyright right—the markets to which copyright owners are entitled—just like transformativeness does.

The Seventh Circuit reached the right result, but its reasoning shallow, its disagreement with the Second Circuit is captious, and its wanton disregard of the jurisprudence of the last twenty years (beginning with the Supreme Court’s decision in Campbell v. Acuff-Rose Music, Inc) is profoundly unfortunate. These are smart judges who could have helped further develop and clarify the law, but chose not to.

August 28, 2014August 29, 2014

Less Cancer, More Birthdays. A call for donations.

There are many amazing charities worthy of your support — the American Cancer Society is one of them. I am raising money for the ACS again this year because I believe in their mission and I think they do a great job.

Cancer research is saving lives and making lives better every single day. My Chicago Marathon race will help save lives and I’m hoping you’ll support me in my efforts.More than 11 million Americans who have a history of cancer will celebrate another birthday this year. My uncle Ivan Sag died of cancer last year. Ivan was a powerful intellect, a leader in his academic field, a music lover and a party animal. We miss him.Please support me with a donation so that together, with the ACS, we can help save lives and create a world with less cancer and more birthdays.

Last year I ran the Chicago Marathon in 3:44:48 and raised over $2,000 — please help me run even faster and do even more to support this great cause this year.

Every donation of $13.1 (50 cents per mile) or more entitles you to nominate one song* for my Marathon playlist. Try to think of something awesome, that I might not have considered. I will be listening and think of you during the race and during my many many miles of training runs.

You can donate by following this DONATIONS link.

Thanks for your support!

* Songs longer than 7 minutes and 30 seconds, songs featuring Katie Perry, country music, opera or that are otherwise unbearable will be included solely at the runner’s discretion.

UPDATES:

The Marathon Playlist so far:

Eye of the Tiger – Surviver
In The Air Tonight – Phil Collins

August 7, 2014August 7, 2014

Copyright and Pornography — Is now the time to panic?

There were 2004 copyright lawsuits filed in federal district courts in the United States in the period from January 1^st to June 30^th 2014. Just under 48% of these suits were filed by copyright owners against anonymous IP addresses accused of copyright infringement online. This is not surprising given the extent of online piracy, but what is more than a little surprising is that almost all of these lawsuits relate to pornographic films. Lawsuits alleging illegal file sharing of pornography were virtually non-existent before 2010, they now (Jan-Jun 2014) account for than 41% of all copyright suits filed.

In my talk tomorrow at the 14^th Annual Intellectual Property Scholars Conference at Berkeley Law School I will address this phenomenon and answer three fundamental questions: (1) When did this happen? (2) How did it happen? and (3) Is now the time to panic?

Here are some of the slides from my talk (below), the full paper is available here (download Copyright Trolling, An Empirical Study)

August 1, 2014August 7, 2014

Why digital humanities researchers support google’s fair use defense

I posted a guest-blog over at the Authors Alliance explaining why digital humanities researchers support google’s fair use defense in Authors Guild v. Google. The Authors Alliance supports Google’s fair use defense because it helps authors reach readers. In my post, I explained another reason why this case is important to the advancement of knowledge and scholarship.

Earlier this month a group of more than 150 researchers, scholars and educators with an interest in the ‘Digital Humanities’ joined an amicus brief urging the Second Circuit Court of Appeals to side with Google in this dispute. Why would so many teachers and academics from fields ranging from Computer Science, English Literature, History, Law, to Linguistics care about this lawsuit? It’s not because they are worried about Google—Google surely has the resources to look after itself—but because they are concerned about the future of academic inquiry in a world of ‘big data’ and ubiquitous copyright.

For decades now, physicists, biologists and economists have used massive quantities of data to explore the world around them. With increases in computing power, advances in computational linguistics and natural language processing, and the mass digitization of texts, researchers in the humanities can apply these techniques to the study of history, literature, language and so much more.

Conventional literary scholars, for example, rely on the close reading of selected canonical works. Researchers in the ‘Digital Humanities’ are able to enrich that tradition with a broader analysis of patterns emergent in thousands, hundreds of thousands, or even millions of texts. Digital Humanities scholars fervently believe that text mining and the computational analysis of text are vital to the progress of human knowledge in the current Information Age. Digitization enhances our ability to process, mine, and ultimately better understand individual texts, the connections between texts, and the evolution of literature and language.

A Simple Example of the Power of the Digital Humanities

The figure below, is an Ngram-generated chart that compares the frequency with which authors of texts in the Google Book Search database refer to the United States as a single entity (“is”) as opposed to a collection of individual states (“are”). As the chart illustrates, it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation. This is a trend with obvious political and historical significance, of interest to a wide range of scholars and even to the public at large. But this type of comparison is meaningful only to the extent that it uses as raw data a digitized archive of significant size and scope.

The United States is/are

There are two very important things to note here. First, the data used to produce this visualization can only be collected by digitizing the entire contents of the relevant books–no one knows in advance which books to look in for this kind of search. Second, not a single sentence of the underlying books has been reproduced in the finished product. The original authors expression was an input to the process, but it was not a recognizable part of the output. This is the fundamental distinction that the Digital Humanities Amici are asking the court to preserve–the distinction between ideas and expression.

Will Copyright Law Prevent the Computational Analysis of Text?

The computational analysis of text has opened the door to new fields of inquiry in the humanities–it allows researchers to ask questions that were simply inconceivable in the analog era. However, the lawsuit by the Authors Guild threatens to slam that door shut.

For over 300 years Copyright has balanced the author’s right to control the copying of her expression with the public’s freedom to access the facts and ideas contained within that expression. Authors get the chance to sell their books to the public, but they don’t get to say how those books are read, how people react to them, whether they choose to praise them or pan them, how they talk to their friends about them. Copyright protects the author’s expression (for a limited time and subject to a number of exceptions and limitations not relevant here) but it leaves the information within that expression and information about that expression “free as the air to common use.” The protection of expression and the freedom of non-expression are both fundamental pillars of American Copyright law. However, the Author Guild’s long running campaign against library digitization threatens to erase that distinction in the digital age and fundamentally alter the balance of copyright law.

In the pre-digital era, the only reason to copy a book was to read it, or at least preserve the option of reading it. But this is no longer true. There are a host of modern technologies that literally copy text as an input into some larger data-processing application that has nothing to do with reading. For want of a better term, we call these ‘non-expressive uses’ because they don’t necessarily involve any human being reading the authors original expression at the end of the day.

Most authors, if asked, support making their works searchable because they want them to be discovered by new generations of readers. But this is not our central point. Our point is that if it is permissible for a human to pick up a book and count the number of occurrences of the word “whale” (1119 times in Moby Dick) or the ratio of male to female pronouns (about 2:1 in A Game of Thrones Book 1—A Song of Ice and Fire), etc., then there is no reason the law should prevent researchers doing this on a larger and more systematic basis.

Digitizing a library collection to make it searchable or to allow researchers to analyze create and analyze metadata does not interfere with the interests that copyright owners have in the underlying expression in their books.

Who knows what the next generation of humanities researchers will uncover about literature, language, and history if we let them?

You can download the Brief of Digital Humanities and Law Scholars as Amici Curiae here.

July 12, 2014July 12, 2014

Digital Humanities and Legal Scholars in Authors Guild v. Google filed

On Thursday this week, we filed a brief on behalf over 150 researchers, scholars and educators in Authors Guild v. Google, currently on appeal to the Second Circuit Court of Appeals.

The Brief of Digital Humanities and Legal Scholars argues that Copyright law is not, and should not be, an obstacle to the computational analysis of text. Copyright law has long recognized the distinction between protecting an author’s original expression and the public’s right to access the facts and ideas contained within that expression.

We are confident that the Second Circuit will vote to maintain that distinction in the digital age so that library digitization, internet search and related non-expressive uses of written works remain legal.

The final version of the brief is available on the free online repository ssrn.com at this link address: http://ssrn.com/abstract=2465413.

We are grateful for the support of so many wonderful scholars in this important case and we are even more grateful for all the fascinating research that these computer scientists, english professors, historians, linguists, and all those working in the digital humanities do to enrich our lives.

We would also like to thank The Association for Computers and the Humanities and the Canadian Society of Digital Humanities/Société canadienne des humanités numériques for their support as institutions.

Matthew Jockers

Matthew Sag

Jason Schultz

July 3, 2014July 3, 2014

Copyright Trolling Data, Updated to June 30 2014

Copyright Trolls, Pornography, Statutory Damages…

[Revised at 5:43pm to account for an idiotic mistake in Excel – Just going to show that you should not use excel for even the most simple things]

The gifts that keep on giving.

I have updated my data on copyright trolling to include cases filed up to June 30, 2014. The data is now available to anyone interested in replication. I have also revised my paper Copyright Trolling, An Empirical Study (download the full paper from ssrn) with the following table that shows the phenomenal influence of Malibu Media.

Bottom line: Malibu Media accounted for 10% of all copyright suits filed in 2012, 27% in 2013 and 40% in the first half of 2014.

Copyright Suits Filed in U.S. District Courts – 2001 to June 30 2014

The top section of the table shows how many cases were filed under the 820 code for Copyright in U.S. Federal District Courts in the years 2003 to 2014. The bottom section of the table translates the same information into percentages. The “Copyright – All” category includes all copyright cases. “Copyright –John Doe” includes all copyright cases where the defendant was a John Doe, without differentiating as to the underlying subject matter of the compliant. “Copyright – John Doe (Porn)” is a subset of the previous category and includes all cases identified as relating to pornography. The final category, “Malibu Media v. Doe(s)” includes every case filed by Malibu Media against one or more John Does.

June 30, 2014June 30, 2014

Call for signatories: Digital Humanities Amicus in Authors Guild v. Google

Matthew Jockers, Jason Schultz and I have written an amicus brief in the upcoming Court of Appeals round of Authors Guild v. Google, Inc.

Download the draft here: DH Amicus AG v Google CA2

Background

Since we started working on this project just over two years ago two district courts and the Court of Appeals for the Second Circuit have rejected the Authors Guild’s attacks on library digitization and the legality of text-mining. We are confident that the Second Circuit will uphold Judge Chin’s decision last year where he rejected (on a motion for summary judgement) the Authors Guild’s copyright infringement claim against Google over its Google Book Search product. The rulings in Authors Guild v. Google and the parallel case of Authors Guild v. Hathitrust are a critical moment in the fight to define fair use for the Digital Humanities.In Authors Guild v. Google, Judge Chin expressly based ruling in part on the fact that

“Google Books … has transformed book text into data for purposes of substantive research, including data mining and text mining in new areas, thereby opening up new fields of research. Words in books are being used in a way they have not been used before. Google Books has created something new in the use of book text — the frequency of words and trends in their usage provide substantive information.”

In his decision, Judge Chin cites the Brief of Digital Humanities and Law Scholars as Amici Curiae that we submitted on behalf of more than 100 researchers and scholars last year. Chin wrote that

“Google Books permits humanities scholars to analyze massive amounts of data — the literary record created by a collection of tens of millions of books.”

The Authors Guild is now appealing Judge Chin’s decision (on this and other grounds). A different panel of that same court has already upheld the decision in Authors Guild v. Hathitrust. We believe that these cases will have a dramatic effect on research in computer science to linguistics, history, literature and the digital humanities.

Argument in a nutshell

According to the U.S. Constitution, the purpose of copyright is “To promote the Progress of Science and useful Arts”. Copyright law should not be an obstacle to statistical and computational analysis of the millions of books owned by university libraries. Copyright law has long recognized the distinction between protecting an author’s original expression and the public’s right to access the facts and ideas contained within that expression. That distinction must be maintained in the digital age so that library digitization, internet search and related non-expressive uses of written works remain legal.

What can you do?

If you are a legal academic or student, academic or researcher who would be effected by this issue, you can help preserve the balance of copyright law by joining our brief as a signatory (we need your name and affiliation e.g. Associate Professor, Jane Doe, Springfield University).

Does this concern you?

If you are still reading this post, the answer is probably YES. We are collecting signatures from a wide range of fields, including computer science, english, history, law, linguistics and philosophy. We need your name etc., by July 9, 2014. Please enter your details directly via this online tool:

https://docs.google.com/forms/d/1QSA_fUSaRpw47wwRcXh0SXkZFx1NQ2NbjhBbfTrICnA/viewform?usp=send_formPlease feel free to share this invitation with other interested academics and Phd students.

Thank you!

June 19, 2014June 30, 2014

Matthew Jockers explains why you can’t read a book through snippets

The Authors Guild’s war on search engines, text-mining and academic research is in its final throws. Over the last two years two different US Federal District Courts have held that library digitization for the purpose of building a search index and running a search engine is fair use. See, Authors Guild v. Hathitrust 902 F. Supp. 2d 445 (S.D.N.Y. 2012) and Authors Guild v. Google 954 F. Supp. 2d 282 (S.D.N.Y. 2013). The Hathitrust decision was upheld on appeal on June 10 this year (Authors Guild v. Hathitrust, 2nd Circuit 2014) and the parties and interested amici are gearing up for a final showdown in the appeal of Authors Guild v. Google.

In the Guild’s latest legal salvo it argues – by repeated assertion – that the text snippets Google displays to users allow 78% of the contents of any book to be reconstructed. (e.g., at p.10 “The scanning process resulted in an index that contains the complete text of all the books copied in the Library Project.”)

My sometime co-author and accomplished Digital Humanities researcher, Matthew Jockers, tested out the Guild’s claims on his own book and … it turns out that you can’t read a book through snippets, unless you already have the book, and that even then it takes about 30 minutes to trick the search engine into giving you the next 100 words beyond the free-view.

As Matt explains:

“Reading 78% of my book online, as the Guild asserts, requires that the reader anticipate what words will appear in the concealed sections of the book.”

He concludes

“Well, this certainly is reading Macroanalysis the hard way. I’ve now spent 30 minutes to gain access to exactly 100 words beyond what was offered in the initial preview. And, of course, my method involved having access to the full text! Without the full text, I don’t think such a process of searching and reading is possible, and if it is possible, it is certainly not feasible!

But let’s assume that a super savvy text pirate, with extensive training in English language syntax could guess the right words to search and then perform at least as well as I did using a full text version of my book as a crutch. My book contains, roughly, 80,000 words. Not counting the ~5k offered in the preview, that leaves 75,000 words to steal. At a rate of 200 words per hour, it would take this super savvy text pirate 375 hours to reconstruct my book. That’s about 47 days of full-time, eight-hour work.”

Matt’s book is Macroanalysis: Digital Methods and Literary History and — as seen on the screen shot I just made of Google Books — you can buy the eBook version, linked to from the Google Books web page, for $14.95.

June 10, 2014June 10, 2014

Authors Guild v. HathiTrust — Libraries 3 : Authors Guild 0

The Second Circuit Court of Appeals has upheld the most important parts of the District Court decision in Authors Guild v. HathiTrust. Here is a link to the decision –AGvHathiTrust_CA2_2013.

Along with the district decision in this case and the one in Authors Guild v. Google, this makes the current score, Libraries 3 : Authors Guild 0

The decision confirms that library digitization (as performed by Google in conjunction with the University of Michigan, University of Illinois and many others) does not infringe copyright if it is done for the purpose of allowing blind and visually disabled people to read books.

Access to the Print‐Disabled
The HDL also provides print‐disabled patrons with versions of all of the works contained in its digital archive in formats accessible to them. In order to obtain access to the works, a patron must submit documentation from a qualified expert verifying that the disability prevents him or her from reading printed materials, and the patron must be affiliated with an HDL member that has opted‐into the program. Currently, the University of Michigan is the only HDL member institution that has opted‐in. We conclude that this use is also protected by the doctrine of fair use.

The decision confirms that library digitization does not infringe copyright if it is done for the purpose of text-mining or creating a search engine. This is core of the non-expressive use argument that Matthew Jockers, Jason Schultz and I made in the Digital Humanities Amicus Brief (http://ssrn.com/abstract=2274832). That brief was joined by over 100 professors and scholars who teach, write, and research in computer science, the digital humanities, linguistics or law, and two associations that represent Digital Humanities scholars generally.

The crux of our argument was that mass digitization of books for text-mining purposes is a form of incidental or “intermediate” copying that enables ultimately non-expressive, non-infringing, and socially beneficial uses without unduly treading on any expressive—i.e., legally cognizable—uses of the works. The Court of Appeals appears to have agreed.

Full‐Text Search
It is not disputed that, in order to perform a full‐text search of books, the Libraries must first create digital copies of the entire books. Importantly, as we have seen, the HDL does not allow users to view any portion of the books they are searching. Consequently, in providing this service, the HDL does not add into circulation any new, human‐readable copies of any books. Instead, the HDL simply permits users to “word search”—that is, to locate where specific words or phrases appear in the digitized books. Applying the relevant factors, we conclude that this use is a fair use.

The Court left itself some room to maneuver if it turns out that, for reason, digitization for non-expressive uses like text mining causes unforeseen harm in different circumstances. For example, a digitization project that did not bother with any kind of security might not be fair use.

Without foreclosing a future claim based on circumstances not now predictable, and based on a different record, we hold that the balance of relevant factors in this case favors the Libraries. In sum, we conclude that the doctrine of fair use allows the Libraries to digitize copyrighted works for the purpose of permitting full‐text searches.

With that appropriate caveat, this is a great win for for humanity and the Digital Humanities respectively.

I am proud to have played my small part in this case over the years.

Posts