Category Archives: fair use

Second Circuit clears the last hurdle for Google Book Search

The Second Circuit ruled today that, in its present form, the library digitization that Google began over ten years ago does not infringe US copyright law. This decision was entirely predictable given the court’s ruling in the related Hathitrust litigation, it is nonetheless momentous. Judge Leval’s cogent explanation of the law and the facts is an exemplary piece of legal writing. The decision is available here (AG v Google October 16, 2015) and merits careful reading.

This is great win for Google, but more importantly, it confirms a balanced approach to copyright law that will ultimately benefit authors, researchers, the reading public and the developers of new forms of information technology.

I have written several law review articles on the issues raised in this case — Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal 2012, The Google Book Settlement and the Fair Use Counter-factual, and Copyright and Copy-Reliant Technology 103 Northwestern University Law Review 1607–1682 (2009). However, I believe that it was only when I teamed up with Matthew Jockers (a professor of English literature) and Jason Schultz (a law professor with deep experience in public interest litigation in addition to expertise in copyright) to write the amicus Brief Amicus Curiae of Digital Humanities and Law Scholars that my work truly became influential. The court did not cite the any amicus briefs in the case, but they were cited in the district court case and the related Hathitrust cases. Reading Judge Leval’s decision, I think it is clear that the excellent briefing by Google’s lawyers and the many public interest groups who contributed was helpful and influential.

This case is great victory for the public interest, it is also a great illustration of how a deep commitment to scholarship complements law school clinical programs and helps us serve the public interest.

Some thoughts on Fair use, Transformative Use and Non-Expressive Use

Fair use, Transformative Use and Non-Expressive Use

Or,

Campbell v. Acuff-Rose and the Future of Digital Technologies, notes on a short presentation at the Fair Use In The Digital Age: The Ongoing Influence of Campbell v. Acuff-Rose’s “Transformative Use Test” Conference, April 17 & 18, 2015, University of Washington School of Law.

Copyright and disintermediation technologies

Copyright policy was hit by an analog wave of disintermediation technology in the post-war era and a digital wave of disintermediation technologies beginning in the 1990s. These successive waves of technology have forced us to reevaluate the foundational assumption of copyright law; that assumption being that any reproduction of the work should be seen as an exchange of value passing from the author (or copyright owner) to the consumer.

Technologies such as the photocopier and the videocassette recorder and then later the personal computer significantly destabilized copyright policy because these inventions, for the first time, placed commercially significant copying technology directly in the hands of large numbers of consumers. This challenge has only been accelerated by digitalization and the Internet. Digitalization allows for perfect reproduction such that the millionth copy of an MP3 file sounds just as good as the first copy.

The implications of the copying that these devices enabled were not clear-cut. In some cases, the new copying technology simply enabled greater flexibility in consumption, in others they generated new copies to be released into the stream of commerce as competitors with the author’s original authorized versions. The Internet has connected billions of people together leading to an outpouring of creativity and user-generativity, but from the perspective of the entertainment industry is also brought people together to undertake a massive scale piracy.

The significant of Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569 (1994)

The Supreme Court in Sony v. Universal[1] had already shown that it was willing to apply fair use in a flexible manner in situations where the use was personal and immaterial to the copyright owner. The significance of the Court’s decision in Campbell[2] was that, by reorienting the fair use doctrine around the concept of transformative use, the Court prepared the way for a flexible consideration of technical acts of reproduction that do not have the usual copyright significance.

Internet search engines, plagiarism detection software, text mining software and other copy-reliant technologies do not read, understand, or enjoy copyrighted works, nor do they deliver these works directly to the public.  They do, however, necessarily copy them in order to process them as grist for the mill, raw materials that feed various algorithms and indices. Campbell arrived just in time to provide a legal framework far more hospitable to copy-reliant technology than had previously existed. Even in its broadest sense, transformative use is not the be all and end all of fair use. At the risk of over-simplification, Sony v. Universal safeguarded the future of the mp3 player, whereas Campbell secured the future of the Internet and reading machines.

Copy-reliant technology and non-expressive use

Some of the most important recent technological fair use cases can be summarized as follows: Copying that occurs as an intermediate technical step in the production of a non-infringing end product is a ‘non-expressive’ use and thus ordinarily constitutes fair use.[3] The main examples of non-expressive use I have in mind are the construction of search engine indices,[4] the operation of plagiarism detection software[5] and, most recently, library digitization to make paper books text-searchable.[6]

To have a coherent concept of fair use, or any particular category of fair use, one needs a coherent concept of copyright. As expressed in the U.S. Constitution, copyright’s motivating purpose is “to promote the Progress of Science and useful Arts.”[7] Ever since the Statute of Anne in 1710, the purpose of Copyright law has been to encourage the creativity of authors and to promote the creation and dissemination of works of authorship. Copyright is not a guarantee of total control; in general, the copyright owner’s rights are limited and defined in reference to the communication of the expressive aspects of the work to the public. This is evident in the idea-expression distinction, the way courts determine whether two works are substantial similar and the focus of fair use cases on expressive substitution. Thus, subsequent authors may not compete with the copyright owner by offering her original expression to the public as a substitute for the copyright owner’s work, but they are free to compete with their own expression of the same facts, concepts and ideas. They are also free to expose, criticize and even vilify the original work. Genuine parodies, critiques and illustrative uses are fair use so long as the copying they partake in is reasonable in light of those purposes.

If public communication and expressive substitution are rightly understood as copyright’s basic organizing principles, then it follows that non-expressive uses — i.e., uses that involve copying, but don’t communicate the expressive aspects of the work to be read or otherwise enjoyed — must be fair use. In fact, they are arguably the purest essence of fair use. Groking the concept of non-expressive use simply involves taking the well understood distinction between expressive and nonexpressive works and making the same distinction in relation to potential acts of infringement.

The legal status of actual copying for nonexpressive uses was not a burning issue before digital technology. Outside the context of reading machines like search engines, plagiarism software and the like, courts have quite reasonably presumed that every copy of an expressive work is for an expressive purpose. But this assumption no longer holds. At a minimum, preserving the functional force of the idea-expression distinction in the digital context requires that copying for purely non-expressive purposes, such as the automated extraction of data, should not be infringing.

Some limits to the non-expressive use framework

Non-expressive use is a sufficient but not necessary condition of fair use. For example, parody is an expressive use, but it is fair use because it does not tend to threaten expressive substation. Even within the realm of recent technology cases, non-expressive use is not the right framework for addressing important man-machine interaction questions such as disability access, also a key issue in the HathiTrust litigation, but it does tie together a number of disparate threads.

The cases which hold that software reverse engineering is fair use are grounded firmly in the idea-expression distinction,[8] but they are not exactly non-expressive use cases for the reasons that follow.[9] The non-expressive use framework is also not the right tool in cases where software is copied in order to access its functionality: after-all, software is primarily functional and its primary (perhaps exclusive) value comes from the function it performs. Software piracy can’t be justified as a non-expressive use, because to do so would defeat the statutory scheme wherein Congress chose to graft computer software protection onto copyright. However, the reverse engineering cases still follow the logic of non-expressive use. In those cases copying to access certain API’s and other unprotectable elements enabled the copyists to either independently recreate that functionality (akin to conveying the same ideas with different expression) or to develop programs or machines that would complement the original software.

Non-expressive use versus transformative use?

The main issue left to resolve in terms of the copy-reliant technology and non-expressive use seems to be one of nomenclature. Is non-expressive use simply a subset of transformative use? Or is it a separate species of fair use with similar implications to that of transformative use.

Non-expressive use, as I have defined and elucidated in a series of law review articles and amicus briefs, is a clear coherent concept that ties a broad set of fair use cases directly to one of copyright’s core principles, the idea-expression distinction. Transformative use, as explained by Pierre Leval and adopted by the Supreme Court is rooted in the constitutional imperative for copyright protection – the creation of new works and the promotion of progress in culture, learning, science and knowledge. But for all that, if transformative use is invoked as an umbrella term, it is often hard to see what holds the category together.

The Campbell Court did not posit transformative use as a unified, exhaustive theory, but it did say that “[a]lthough such transformative use is not absolutely necessary for a finding of fair use, the goal of copyright, to promote science and the arts, is generally furthered by the creation of transformative works. Such works thus lie at the heart of the fair use doctrine’s guarantee of breathing space within the confines of copyright, …”[10] No doubt, when the Supreme Court spoke of transformative use, it had various communicative and expressive uses, such as parody, the right of reply, public comment and criticism in mind. But since Campbell, lower courts have applied the same purposive interpretation of copyright to a broader set of challenges. Campbell was decided in a different technological context and it is true that many of today’s technological fair use issues were entirely unimaginable before the birth of the World Wide Web and our modern era of big data, cloud computing, social media, mobile connectivity and the “Internet of Things”.

Non-expressive use is a useful concept because it provides a way for courts to recognize the legitimacy of copying that is inconsequential in terms of expressive substitution, but does not necessarily lead to the creation of the type of new expression that the Supreme Court had in mind in Campbell. The use of reading machines in digital humanities research is easy to justify, both in terms of the lack of expressive substitution and in the obvious production of meaning, new insights and potentially new and utterly transformative works of authorship. But what of less generative non-expressive uses? For example, in the future a robot might ‘read’ a copyrighted poster on a subway wall advertising a rock concert in Central Park. The robot might then ‘decide’ to change its travel plans in light of the predictable disruption. The acts of ‘reading’ and ‘deciding’ are both simply computational. Even if reading involves making a copy of the work inside the brain of a machine, it seems nonsensical to conclude that the robot was used to infringe copyright. In the age of the printing press, copying a work had clear and obvious implications. Copying was invariably for expressive ends and it was almost always the point of exchange of value between author and reader. The copyright implications of copying are much more contingent in the digital age.

There is much clarity to be gained by talking directly in terms of non-expressive use rather than relying on transformative as broad umbrella for a range of expressive and non-expressive fair uses. Such clear thinking would hopefully ease the anxieties of the entertainment industry that still fears that fair use is simply a stalking horse for dismantling copyright. Nonetheless, it would not be surprising if courts were more comfortable sticking with the language of transformativeness that Judge Pierre Leval gave us in “Toward a Fair Use Standard“,[11] and the Supreme Court adopted in Campbell.

This is a sketch of some ideas, no doubt revisions will follow after this exciting conference.

Related Publications:

Matthew Sag, Copyright and Copy-Reliant Technology 103 Northwestern University Law Review 1607–1682 (2009)

Matthew Sag, Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal 1503–1550 (2012)

Matthew Jockers, Matthew Sag & Jason Schultz, Digital Archives: Don’t Let Copyright Block Data Mining, 490 Nature 29-30 (October 4, 2012)

Somewhat Related Publications:

Peter DiCola & Matthew Sag, An Information-Gathering Approach to Copyright Policy, 34 Cardozo Law Review 173–247 (2012)

Matthew Sag, Predicting Fair Use 73 Ohio State Law Journal 47–91 (2012)

Matthew Sag, The Pre-History of Fair Use 76 Brooklyn Law Review 1371–1412 (2011)

 

[1] Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984).

[2] Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569 (1994).

[3] See generally, Matthew Sag, Copyright and Copy-Reliant Technology 103 Northwestern University Law Review 1607–1682 (2009)

[4] There is no case addressing the legality of the process of making a text-based search index (as opposed to caching or display of search results), but the proposition naturally flows from Kelly v. Arriba Soft Corp., 336 F.3d 811 (9th Cir. 2003) and Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 (9th Cir. 2007) and is a necessary implication of Authors Guild, Inc. v. Hathitrust, Court of Appeals, 2nd Circuit 2014 and Authors Guild, Inc. v. Google Inc., 954 F. Supp. 2d 282 (S.D.N.Y. 2013)

[5] A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d 630 (4th Cir. 2009).

[6] Authors Guild, Inc. v. Hathitrust, Court of Appeals, 2nd Circuit 2014; Authors Guild, Inc. v. Google Inc., 954 F. Supp. 2d 282 (S.D.N.Y. 2013). See also Matthew Sag, Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal 1503–1550 (2012); Matthew Jockers, Matthew Sag & Jason Schultz, Digital Archives: Don’t Let Copyright Block Data Mining, 490 Nature 29-30 (October 4, 2012).

[7] U.S. Const. art. I, § 8, cl. 8.

[8] Sega Enter. Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992); Sony Computer Entm’t, Inc. v. Connectix Corp., 203 F.3d 596, 606 (9th Cir. 2000).

[9] These reasons are more fully elaborated in Matthew Sag, Copyright and Copy-Reliant Technology 103 Northwestern University Law Review 1607–1682 (2009).

[10] Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 579 (1994)(citation omitted).

[11] 103 Harv. L. Rev. 1105 (1990)

#Aereo was always doomed to fail

and today it filed for Chapter 11 bankruptcy.

I have added some final thoughts on fair use to my review of the Aereo decision. You can download the Article from ssrn at this link (http://ssrn.com/abstract=2529047 …).

The main addition to my post from a few days ago is the following paragraph:

Unlike Cablevision’s remote-DVR, Aereo was an unlikely candidate for fair use. Holding that a remote DVR is fair use would be logical extension of the Supreme Court’s 1984 Sony Betamax decision. Like a VCR, a DVR simply allows the consumer to do that which they were already authorized to do more conveniently. No doubt, Aereo would make the same argument with respect to its service, but there is one critical difference. Judge Chin’s intuition that Aereo’s design was a mere “Rube Goldberg-like contrivance, over-engineered in an attempt to avoid the reach of the Copyright Act,” was spot on; however a technological contrivance should not be the foundation for a legal contrivance. The very fact of Aereo’s contrivance to avoid the public performance right is the reason why it’s fair use claim should fail. Congress amended the Copyright Act in 1976 to make the retransmission of free to air television broadcasts an additional copyright tolling point. There could not be a better argument against fair use than the fact that Aereo’s service was designed to defeat the purpose of the statute. There was no need for the Supreme Court to adopt such a tortured and opaque reading of the transmit clause of the public performance right. Aereo could have been decided as simple fair use case.

Authors Guild v. Google will be argued in the Second Circuit on December 3rd

This means we could get a decision before the case’s 10 year anniversary!

Docket Number: 13-04829 in United States Court of Appeals for the Second Circuit
Title: The Authors Guild v. Google, Inc.
Date Filed: 12/23/2013
Docket Proceedings
Docket Filed # Docket Text
143 09/30/2014 CASE CALENDARING, for argument on 12/03/2014 at 2:00pm, SET.[1332695] [13-4829]

 Kienitz v. Sconnie Nation — transformative uses and derivative works. #Fairuse

Some additional thoughts on the 7th Circuit’s decision in Kienitz v. Sconnie Nation LLC, No. 13-3004 (7th Cir. Sept. 15, 2014).

Judge Easterbrook expressed some skepticism today over the Second Circuit’s decision in Cariou v. Prince, 714 F. 3d 694 (2d Cir. 2013) because …

asking exclusively whether something is “transformative” not only replaces the list in §107 but also could override 17 U.S.C. §106(2), which protects derivative works. To say that a new use transforms the work is precisely to say that it is derivative and thus, one might suppose, protected under §106(2).

Easterbrook complains that

Cariou and its predecessors in the Second Circuit do not explain how every “transformative use” can be “fair use” without extinguishing the author’s rights under §106(2).”

Ok, so let me explain.

First, Cariou and its predecessors don’t say that every transformative use is fair use. Second, more importantly, transformative use and derivative work are both important terms of art in copyright law. They are not the same thing. Nobody thinks they are.

Section 106(2) of the Copyright Act gives copyright owners an exclusive right to prepare derivative works based on the copyright owner’s original work. As defined in the statute, a derivative work takes a preexisting work and “recasts, transforms, or adapts” that work. The kind of transformations referred to here are not necessarily ‘transformative’ as that term was intended by the Supreme Court in the context of fair use. And yes, obviously, using a word that is not a stem of ‘transform’ would have helped. 

A transformative work, in the fair use sense, is one which imbues the original “with a further purpose or different character, altering the first with new expression, meaning, or message.” [Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 579 (1994) (internal citations omitted).] Thus, the assessment of transformativeness is not merely a question of the degree of difference between two works; rather it requires a judgment of the motivation and meaning of those differences.

The difference between a non-infringing transformative use and an infringing derivative work can be illustrated as follows: if Pride and Prejudice were still subject to copyright protection, the novel Pride and Prejudice and Zombies, which combines Austen’s original work with scenes involving zombies, cannibalism, and ninjas, would be considered a transformative parody of the original, and thus fair use rather than infringement. In contrast, a more traditional sequel would merely be an infringing derivative work.

The term transformative use has been applied to cases of literal transformation where it overlaps with the kinds of manipulations that might also create a derivative work. Thus in Suntrust Bank v. Houghton Mifflin Co, substantial copying of a novel in the service of criticism was regarded as transformative.

The term transformative use has been applied to cases of copying without modification, but for a good reason. For example in  Savage v. Council on American-Islamic Relations, Inc., the Islamic organization copied and distributed anti-Islamic statements made by Michael Savage as part of a fund-raising exercise. Recontextualization without modification from one expressive context to another was seen as transformative Bill Graham Archives v. Dorling Kindersley Ltd.

In addition to these cases, courts have also found a number of non-expressive uses to be transformative. In particular, several cases have held that automated processing and display of copyrighted photos as part of a visual search engine is a transformative and thus a fair use. In A.V. v. iParadigms, LLC, the Fourth Circuit found that the automated processing of the plaintiff students’ work in defendant’s plagiarism detection software was fair use). More recently, Authors Guild v. HathiTrust (SDNY), Authors Guild v. HathiTrust (2d Cir) and Authors Guild v. Google (SDNY) held that library digitization to create a search engine was transformative use and fair use.

Maybe we would be better off with different words for all these situations. David Nimmer suggests that in the hands of some judges, transformative use has no content at all and that it is simply synonymous with a finding of fair use. According to Pamela Samuelson, a better approach would be to distinguish transformative critiques, such as parodies, from productive uses for critical commentary. Samuelson also suggests that courts should not label orthogonal uses—uses wholly unrelated to the use made or envisaged by the original author—as transformative uses. But she does think that these are good candidates for fair use.

My personal preference would be for the term transformative use to be confined to expressive uses of copyrighted works and that non-expressive use (as exemplified by search engines, plagiarism detection software, text mining, etc) should be recognized as a distinct category of preferred use. Nonetheless, transformative use is the term of art most courts use and we should probably learn to live with it.

Further Reading

  • David Nimmer, Nimmer on Copyright § 13.05[A][1]
  • Matthew Sag, Copyright and Copy-Reliant Technology, 103 Nw. U. L. Rev. 1607 (2009)
  • Pamela Samuelson, Unbundling Fair Uses, 77 Fordham L. Rev. 2537 (2009).

Even when Judge Easterbrook is right, he is wrong. #fairuse #copyright #blaaah

I have been wanting to blog about the 7th Circuit’s appalling decision in Kienitz v. Sconnie Nation LLC, No. 13-3004 (7th Cir. Sept. 15, 2014) since I read it — exactly seven twenty minutes ago. However, two fifteen minutes ago I discovered that Prof. Rebecca Tushnet (Georgetown Law) has already said most of what I wanted to say.

The case is about the transformative use of a photo. The case for transformation is pretty easy here because there is both substantive transformation (see below) and an obvious shift in purpose in that the original photo is a PR shot of politician opposed to a street party and the new use is a caricature of the same politician on tee-shirts and tank tops.

sconnie

The court of appeals took this easy case as an opportunity to try to unsettle the law of fair use by casting stones at the concept of transformativeness. The court notes that  transformativeness doesn’t appear in the statute, and says it was “mentioned” it in Campbell.  What the Supreme Court actually said in Campbell was  “The central purpose of this investigation is to see  whether the new work merely supersedes the objects of the original creation,  or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message; it asks, in other words, whether and to what extent the new work is ‘transformative.'”  Campbell v. Acuff-Rose Music, Inc., 510 US 569 (1994). (internal citations and quotations omitted)

This is a bit more than a mention.

Now I’ll just quote Rebecca:

…Having not quoted either the Supreme Court or the Second Circuit’s definition of transformativeness (which might allow one to assess whether there is too great an overlap with the derivative works right, or for that matter with the reproduction right since that’s what the majority of Second Circuit transformativeness findings deal with), the Seventh Circuit tells us to stick to the statute.  But it doesn’t tell us what the first factor does attempt to privilege and deprivilege. Instead, the court goes to its own economic lingo-driven test: “whether the contested use is a complement to the protected work (allowed) rather than a substitute for it (prohibited).”  Where this appears in the statute is left as an exercise for the reader, though by placement in the opinion we might possibly infer that it is the appropriate rephrasing of factor one, as opposed to inappropriate transformativeness (though the court later says that factor one isn’t relevant at all).  However, complement/substitute requires some baseline for understanding the appropriate scope of the copyright right—the markets to which copyright owners are entitled—just like transformativeness does.

The Seventh Circuit reached the right result, but its reasoning shallow, its disagreement with the Second Circuit is captious, and its wanton disregard of the jurisprudence of the last twenty years (beginning with the Supreme Court’s decision in Campbell v. Acuff-Rose Music, Inc) is profoundly unfortunate. These are smart judges who could have helped further develop and clarify the law, but chose not to.

 

 

Why digital humanities researchers support google’s fair use defense

I posted a guest-blog over at the Authors Alliance explaining why digital humanities researchers support google’s fair use defense in Authors Guild v. Google.  The  Authors Alliance supports Google’s fair use defense because it helps authors reach readers. In my post, I explained another reason why this case is important to the advancement of knowledge and scholarship.

Earlier this month a group of more than 150 researchers, scholars and educators with an interest in the ‘Digital Humanities’ joined an amicus brief urging the Second Circuit Court of Appeals to side with Google in this dispute. Why would so many teachers and academics from fields ranging from Computer Science, English Literature, History, Law, to Linguistics care about this lawsuit? It’s not because they are worried about Google—Google surely has the resources to look after itself—but because they are concerned about the future of academic inquiry in a world of ‘big data’ and ubiquitous copyright.

For decades now, physicists, biologists and economists have used massive quantities of data to explore the world around them. With increases in computing power, advances in computational linguistics and natural language processing, and the mass digitization of texts, researchers in the humanities can apply these techniques to the study of history, literature, language and so much more.

Conventional literary scholars, for example, rely on the close reading of selected canonical works. Researchers in the ‘Digital Humanities’ are able to enrich that tradition with a broader analysis of patterns emergent in thousands, hundreds of thousands, or even millions of texts. Digital Humanities scholars fervently believe that text mining and the computational analysis of text are vital to the progress of human knowledge in the current Information Age. Digitization enhances our ability to process, mine, and ultimately better understand individual texts, the connections between texts, and the evolution of literature and language.

A Simple Example of the Power of the Digital Humanities

The figure below, is an Ngram-generated chart that compares the frequency with which authors of texts in the Google Book Search database refer to the United States as a single entity (“is”) as opposed to a collection of individual states (“are”). As the chart illustrates, it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation. This is a trend with obvious political and historical significance, of interest to a wide range of scholars and even to the public at large. But this type of comparison is meaningful only to the extent that it uses as raw data a digitized archive of significant size and scope.

The United States is/are

There are two very important things to note here. First, the data used to produce this visualization can only be collected by digitizing the entire contents of the relevant books–no one knows in advance which books to look in for this kind of search. Second, not a single sentence of the underlying books has been reproduced in the finished product. The original authors expression was an input to the process, but it was not a recognizable part of the output. This is the fundamental distinction that the Digital Humanities Amici are asking the court to preserve–the distinction between ideas and expression.

Will Copyright Law Prevent the Computational Analysis of Text?

The computational analysis of text has opened the door to new fields of inquiry in the humanities–it allows researchers to ask questions that were simply inconceivable in the analog era. However, the lawsuit by the Authors Guild threatens to slam that door shut.

For over 300 years Copyright has balanced the author’s right to control the copying of her expression with the public’s freedom to access the facts and ideas contained within that expression. Authors get the chance to sell their books to the public, but they don’t get to say how those books are read, how people react to them, whether they choose to praise them or pan them, how they talk to their friends about them. Copyright protects the author’s expression (for a limited time and subject to a number of exceptions and limitations not relevant here) but it leaves the information within that expression and information about that expression “free as the air to common use.” The protection of expression and the freedom of non-expression are both fundamental pillars of American Copyright law. However, the Author Guild’s long running campaign against library digitization threatens to erase that distinction in the digital age and fundamentally alter the balance of copyright law.

In the pre-digital era, the only reason to copy a book was to read it, or at least preserve the option of reading it. But this is no longer true. There are a host of modern technologies that literally copy text as an input into some larger data-processing application that has nothing to do with reading. For want of a better term, we call these ‘non-expressive uses’ because they don’t necessarily involve any human being reading the authors original expression at the end of the day. 

Most authors, if asked, support making their works searchable because they want them to be discovered by new generations of readers. But this is not our central point. Our point is that if it is permissible for a human to pick up a book and count the number of occurrences of the word “whale” (1119 times in Moby Dick) or the ratio of male to female pronouns (about 2:1 in A Game of Thrones Book 1—A Song of Ice and Fire), etc., then there is no reason the law should prevent researchers doing this on a larger and more systematic basis.

Game of Thrones Pronouns Etc

Digitizing a library collection to make it searchable or to allow researchers to analyze create and analyze metadata does not interfere with the interests that copyright owners have in the underlying expression in their books.

Who knows what the next generation of humanities researchers will uncover about literature, language, and history if we let them?

You can download the Brief of Digital Humanities and Law Scholars as Amici Curiae here.

Digital Humanities and Legal Scholars in Authors Guild v. Google filed

On Thursday this week, we filed a brief on behalf over 150 researchers, scholars and educators in Authors Guild v. Google, currently on appeal to the Second Circuit Court of Appeals.
The Brief of Digital Humanities and Legal Scholars argues that Copyright law is not, and should not be, an obstacle to the computational analysis of text. Copyright law has long recognized the distinction between protecting an author’s original expression and the public’s right to access the facts and ideas contained within that expression.
We are confident that the Second Circuit will vote to maintain that distinction in the digital age so that library digitization, internet search and related non-expressive uses of written works remain legal.
The final version of the brief is available on the free online repository ssrn.com at this link address: http://ssrn.com/abstract=2465413.
We are grateful for the support of so many wonderful scholars in this important case and we are even more grateful for all the fascinating research that these computer scientists, english professors, historians, linguists, and all those working in the digital humanities do to enrich our lives.
We would also like to thank The Association for Computers and the Humanities and the Canadian Society of Digital Humanities/Société canadienne des humanités numériques for their support as institutions.
Matthew Jockers
Matthew Sag
Jason Schultz

Call for signatories: Digital Humanities Amicus in Authors Guild v. Google

Matthew Jockers, Jason Schultz and I have written an amicus brief in the upcoming Court of Appeals round of Authors Guild v. Google, Inc.

Download the draft here: DH Amicus AG v Google CA2

Background

Since we started working on this project just over two years ago two district courts and the Court of Appeals for the Second Circuit have rejected the Authors Guild’s attacks on library digitization and the legality of text-mining. We are confident that the Second Circuit will uphold Judge Chin’s decision last year where he rejected (on a motion for summary judgement)  the Authors Guild’s copyright infringement claim against Google over its Google Book Search product.  The rulings in Authors Guild v. Google and the parallel case of Authors Guild v. Hathitrust are a critical moment in the fight to define fair use for the Digital Humanities.In Authors Guild v. Google, Judge Chin expressly based ruling in part on the fact that

“Google Books … has transformed book text into data for purposes of substantive research, including data mining and text mining in new areas, thereby opening up new fields of research. Words in books are being used in a way they have not been used before. Google Books has created something new in the use of book text — the frequency of words and trends in their usage provide substantive information.”
In his decision, Judge Chin cites the Brief of Digital Humanities and Law Scholars as Amici Curiae that we submitted on behalf of more than 100 researchers and scholars last year. Chin wrote that
“Google Books permits humanities scholars to analyze massive amounts of data — the literary record created by a collection of tens of millions of books.”

The Authors Guild is now appealing Judge Chin’s decision (on this and other grounds).  A different panel of that same court has already upheld the decision in Authors Guild v. Hathitrust. We believe that these cases will have a dramatic effect on research in computer science to linguistics, history, literature and the digital humanities.

Argument in a nutshell

According to the U.S. Constitution, the purpose of copyright is “To promote the Progress of Science and useful Arts”. Copyright law should not be an obstacle to statistical and computational analysis of the millions of books owned by university libraries. Copyright law has long recognized the distinction between protecting an author’s original expression and the public’s right to access the facts and ideas contained within that expression. That distinction must be maintained in the digital age so that library digitization, internet search and related non-expressive uses of written works remain legal.

What can you do?

If you are a legal academic or student, academic or researcher who would be effected by this issue, you can help preserve the balance of copyright law by joining our brief as a signatory (we need your name and affiliation e.g. Associate Professor, Jane Doe, Springfield University).

Does this concern you?

If you are still reading this post, the answer is probably YES.  We are collecting signatures from a wide range of fields, including computer science, englishhistory, law, linguistics and philosophy. We need your name etc., by July 9, 2014. Please enter your details directly via this online tool:

https://docs.google.com/forms/d/1QSA_fUSaRpw47wwRcXh0SXkZFx1NQ2NbjhBbfTrICnA/viewform?usp=send_formPlease feel free to share this invitation with other interested academics and Phd students.

Thank you!

Matthew Jockers explains why you can’t read a book through snippets

The Authors Guild’s war on search engines, text-mining and academic research is in its final throws. Over the last two years two different US Federal District Courts have held that library digitization for the purpose of building a search index and running a search engine is fair use. See, Authors Guild v. Hathitrust 902 F. Supp. 2d 445 (S.D.N.Y. 2012) and Authors Guild v. Google 954 F. Supp. 2d 282 (S.D.N.Y. 2013). The Hathitrust decision was upheld on appeal on June 10 this year (Authors Guild v. Hathitrust, 2nd Circuit 2014) and the parties and interested amici are gearing up for a final showdown in the appeal of Authors Guild v. Google.

In the Guild’s latest legal salvo it argues – by repeated assertion – that the text snippets Google displays to users allow 78% of the contents of any book to be reconstructed. (e.g., at p.10 “The scanning process resulted in an index that contains the complete text of all the books copied in the Library Project.”)

My sometime co-author and accomplished Digital Humanities researcher, Matthew Jockers, tested out the Guild’s claims on his own book and … it turns out that you can’t read a book through snippets, unless you already have the book, and that even then it takes about 30 minutes to trick the search engine into giving you the next 100 words beyond the free-view.

As Matt explains:

“Reading 78% of my book online, as the Guild asserts, requires that the reader anticipate what words will appear in the concealed sections of the book.”

He concludes

“Well, this certainly is reading Macroanalysis the hard way. I’ve now spent 30 minutes to gain access to exactly 100 words beyond what was offered in the initial preview. And, of course, my method involved having access to the full text! Without the full text, I don’t think such a process of searching and reading is possible, and if it is possible, it is certainly not feasible!

But let’s assume that a super savvy text pirate, with extensive training in English language syntax could guess the right words to search and then perform at least as well as I did using a full text version of my book as a crutch. My book contains, roughly, 80,000 words. Not counting the ~5k offered in the preview, that leaves 75,000 words to steal. At a rate of 200 words per hour, it would take this super savvy text pirate 375 hours to reconstruct my book. That’s about 47 days of full-time, eight-hour work.”

Matt’s book is Macroanalysis: Digital Methods and Literary History and — as seen on the screen shot I just made of Google Books — you can buy the eBook version, linked to from the Google Books web page, for $14.95.

Screen Shot 2014-06-19 at 11.48.07 AM