Category Archives: google books

Not everything is the same as everything else – Authors Guild v Hathitrust (pt. 2)

Introduction and Necessary Disclaimer 

This one of a series of posts concerning the Authors Guild v. Hathitrust case, specifically these posts take the form of commentary on the Authors Guild Appeal Brief (February 25, 2013). Although I am one of the authors of the Digital Humanities and Law Scholars Amicus Brief, the views expressed on this site are purely my own. My comments on the Authors Guild’s Appeal Brief will not be comprehensive, rather, my aim is to review the aspects of the brief that I found interesting.

Today’s topic …

Not everything is the same as everything else 

Legal argument is art of analogizing and distinguishing, drawing out the implications of things already decided in ways that suggest the a favorable outcome for matters still in dispute. Thus, in copyright cases it is quite common to read that x (new thing) is the same as/totally different from y (old thing). The Authors Guild’s brief engages in quite a bit of this kind of argument, but mostly without saying so explicitly. In particular, their brief contains three examples of false equivalence that simply don’t add up.

  1. The Authors Guild implicitly suggests that the defendants’ orphan works project is the same as the Authors Guild’s own proposal to deal with orphan works in Google Book Search Settlement. It isn’t.
  2. The Authors Guild argues that the defendants’ orphan works project is a substitute for orphan works legislation. It isn’t.
  3. The Authors Guild brief proceeds as thought library digitization were the same as library photocopying. It isn’t.

The Universities’ Orphan Works Project v. the Google Book Search Settlement

Most of the Authors Guild’s ink is spilt on the universities’ proposed orphan works project (OWP). The idea behind the defendants’ OWP appears to be that out-of-print books published in the U.S. between 1923 and 1963 should be made available for educational use if the rights holders cannot be reasonably be located. The University of Michigan proposed a method to automate the identification of orphan works for this purpose in 2011. However, the exact nature of this particular project is still yet to determined because after the Authors Guild filed suit against the HathiTrust et al, the University of Michigan announced that the OWP would be temporarily suspended. The University of Michigan candidly admitted that the procedures used to identify orphan works had allowed some works to make their way onto the Orphan Works Lists in error.

The Authors Guild Appeal Brief contains the implicit suggestion that the defendants’ OWP is the same as the audacious exploitation of orphan works that the Authors Guild itself proposed under its Settlement Agreement with Google.

It is true that, as noted at page 10 of the Guild’s Appeal Brief, “a mechanism to help resolve the orphan works issue was one of the key aspects of the attempted settlement of the Google Books case”.

It is also undeniable that Judge Chin commented “the establishment of a mechanism for exploiting unclaimed books is a matter better suited for Congress than this Court”. (Authors Guild v. Google, Inc., 770 F. Supp. 2d 666 (S.D.N.Y. 2011))

But Judge Chin was evaluating the fairness of the private settlement between Google and the Authors Guild, he was not commenting on the question of whether the display of any orphan works under any circumstance could be fair use, nor was he reviewing anything remotely like the libraries much more limited orphan works program.

The Authors Guild proceeds as though the modest orphan works program announced by the university defendants is the same in substance as the universal bookstore rejected by the Judge Chin in 2011. (See e.g., Authors Guild, page 10 “Unhappy with Judge Chin’s decision, [University of Michigan] decided to take the law into its own hands by unilaterally initiating its own program.”) This strikes me as false equivalence.

Under the default settings of the now defunct settlement (proposed 2008, amended 2009, rejected 2011) Google would have been allowed to display up to 20% of a non-fiction work to the entire world and to sell books through consumer purchases and institutional subscriptions. Funds from the sale of orphan works were to held by a ‘book rights registry’ for safe keeping and eventual distribution to worthy causes. [Under the original Settlement Agreement, the revenues attributable to orphan or unclaimed works would have flowed in part to the ‘book rights registry’ and in part to registered authors and publishers.]

The details of the OWP that the defendants may or may not eventually undertake are unclear, but their public statements indicate that any such project would be grounded on non-commercial, limited, educational use. Moreover, the settlement would have treated all books whose copyright owners who failed to notify the registry of their interests as orphan works, the University of Michigan is working on a method to reliably determine a much smaller subset of true orphan works.

Whatever it turns out to be, the Universities’ orphan works project will not be the same as the Authors Guild’s own proposal to deal with orphan works in Google Book Search Settlement.

The Universities’ Orphan Works Project v. Orphan Works Legislation

The Authors Guild Appeal Brief also conflates the universities’ OWP with various legislative solutions that have been proposed over the years in relation to the widely recognized orphan works problem. See for example Authors Guild Ap. Br. at page 15 “Despite clear indications by courts and the Copyright Office that the treatment of orphan works should be left to Congress, the Libraries insist that the OWP is legal.” (There is another example on page 10).

Does it really make sense that Congress’ failure to comprehensively or partially legislate a solution to the problem of orphan works means that the use of orphan works is never allowed under any circumstances, no matter how limited or irrespective of the reason? Congress could act to make out of print works universally available under terms similar to the Authors Guild’s proposal in the Google Book Search settlement, but so what? The mere fact that Congress could in theory set out a system that is broader than the limited scope for orphan works display that would be viable as fair use does not mean that there is no fair use.

Whatever it turns out to be, there is no basis to think that the university defendants’ orphan works project is a substitute for orphan works legislation.

Library Digitization v. Library Photocopying

If you proceed from the assumption that all unauthorized uses of a book are piracy then it makes sense that every new technology is just a new version of the photocopier. The Authors Guild Appeal Brief certainly can certainly be read as adopting the latter view.

The brief argues that “[t]he mechanical conversion of printed books into digital form is not transformative because it does not add any ‘new information, new aesthetics, [or] new insights and understandings,’ to the books.” (citing Pierre Leval, Toward a Fair Use Standard, 103 Harv. L. Rev. 1105, 1111 (1990).) True, there is solid authority that photocopying and cable retransmission are not per se transformative (i.e., without looking at the reasons), but to suggest that library digitization offers no new insights is unsustainable.

Library digitization raises several different issues depending on the purpose behind that digitization and the uses that are subsequently made of the digitized texts. Library digitization could be motivated by any or all of the following:

  1. to preserve existing volumes
  2. to facilitate text-mining, data analysis and digital searching of the contents of books
  3. to facilitate access to electronic versions of books

The legal issues relating to each of these genres must be considered separately, but the Authors Guild’s brief muddles them altogether. Digitization does look a bit like other forms of copying if the motivating purpose is access or display of expressive works (i.e., #3 above). However, the argument in favor of a limited, non-commercial and education focused orphan works project turns not on transformative use, but on other considerations such as the lack of market harm [See Jennifer M. Urban, How Fair Use Can Help Solve the Orphan Works Problem (June 18, 2012)].

Likewise, the argument in favor of library digitization to facilitate disabled access is much broader than the details of the underlying technology. Whether we use the label transformative or not, this is clearly a favored purpose under the first fair use factor. The provision of equal access to copyrighted information for print-disabled individuals is mandated by the Americans with Disabilities Act (ADA). The HathiTrust provides print-disabled individuals with access to millions of items within library collections, whereas in the past they merely had access to a few thousand at best. “Making a copy of a copyrighted work for the convenience of a blind person is expressly identified by the House Committee Report as an example of a fair use, with no suggestion that anything more than a purpose to entertain or to inform need motivate the copying.” (Sony Corp. of Am. v. Universal City Studios, Inc, 464 U.S. 417, 455 n.40 (1984)).

The claim that library digitization is just like photocopying and does not offer any new insights crumbles completely when one considers the non-expressive uses such digitization makes possible. Library digitization makes it possible to extract meta-data from books and to create a useful search engine. Search indexing, text-mining and other computational uses of text could not be more different from mere photocopying; the “new information” and “new aesthetics” they offer include:

  • Text-based searching
  • Research on the structure of language
  • Research on the use of language.

The database as a whole serves a different purpose than each of the constituent works that have been scanned and indexed. The individual works provide content to readers, they convey the authors original expression. The database as a whole provides a means of searching for and identifying books or analyzing the language within books.

Labels like transformative use and nonexpressive use can be helpful in grouping like cases together, but they can also be distracting. The issue of fair use is directly tied to a purposive reading of the Copyright Act and the purpose of copyright is clearly articulated in the U.S. Constitution—“[t]o promote the Progress of Science and useful Arts. . . .”  As the Supreme Court stated in Campbell, the “central purpose” of the fair use investigation is to see, “whether the new work merely supersedes the objects of the original creation, or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message…”

The plaintiffs argue that library digitization is utterly untransformative, but in fact, digitization enabling book search and text-mining clearly leads to “new information, new aesthetics, new insights and understandings.”

For example, as we explained in the Digital Humanities Amicus Brief:

“Google’s “Ngram” tool provides another example of a nonexpressive use enabled by mass digitization—this time easily visualized. Figure 1, below, is an Ngram-generated chart that compares the frequency with which authors of texts in the Google Book Search database refer to the United States as a single entity (“is”) as opposed to a collection of individual states (“are”).


As the chart illustrates, it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation.  This is a trend with obvious political and historical significance, of interest to a wide range of scholars and even to the public at large.  But this type of comparison is meaningful only to the extent that it uses as raw data a digitized archive of significant size and scope. To be absolutely clear, 1) the data used to produce this visualization can only be collected by digitizing the entire contents of the relevant books, and 2) not a single sentence of the underlying books has been reproduced in the finished product. In other words, this type of nonexpressive use only adds to our collective knowledge and understanding, without in any way replacing, damaging the value of, or interfering with the market for, the original works.”

Library digitization is not the same as library photocopying.

The Authors Guild Does Not Speak for Academic Authors

Academic authors are being asked to stand by an watch as the Authors Guild litigates against their wishes and interests, but supposedly on their behalf.

This hubris is not exactly unprecedented. The plaintiffs in Hansberry v. Lee 311 U.S. 32 (1940) sought to enforce a racially restrictive covenant on behalf of a broad class of landowners including African-American’s who would be harmed by enforcement and whites who simply objected. Like the land-owners in Hansberry many academic authors disagree with Authors Guild’s crusade against book digitization. The Supreme Court did not allow the plaintiffs to hijack the class in Hansberry, hopefully the Second Circuit will not allow the Authors Guild to do so in Authors Guild v. Google. 

Pamela Samuelson and David Hansen (both of the University of California, Berkeley – School of Law) have filed a very important amicus brief on behalf of over 150 academic authors* in the Second Circuit Court of Appeals in Authors Guild v. Google. (Available on ssrn)

The brief in support of defendant-appellant Google argues that class certification should have been denied by the District Court because the named plaintiffs don’t represent the interests of academic authors who comprise a large proportion of the class.

The Authors Guild cloaks its lawsuit in the mantel of authorship, yet in reality it represents only a small fraction of the the class it has constructed. Most of the books that Google scanned from major research library collections were written by academics.

The basic problem is that the three individual plaintiffs who claim to be class representatives are not academics and do not share the commitment to broad access to knowledge that predominates among academics.

The plaintiffs’ request for an injunction to stop Google from making the Book Search corpus available would be harmful to academic author interests. The only way for the interests of academic authors to be vindicated in this litigation, given the positions that the plaintiffs have taken thus far, is for Google to prevail on its fair use defense and for the named plaintiffs to lose.

As we explained in the Digital Humanities Amicus Brief in the district court, “[m]ass digitization, like that employed by Google, is a key enabler of socially valuable computational and statistical research (often called “data mining” or “text mining”),”  which allows researchers to discover and use the non-copyrightable facts and ideas that are contained within the collection of copyrighted works themselves.

The Authors Guild are bad representatives of the interests of academic authors because

  1. Academic authors would generally prefer their books be findable using Google Book Search.
  2. If the Authors Guild wins, academic authors will be deprived of a valuable resource, in the form of the Google Book Search Engine and the HathiTrust Digital Library.
  3. If the Authors Guild wins, text mining — the most basic tool of the Digital Humanities — will have been declared to be prima facie illegal.
* I was one of the signatories.



HathiTrust Wins on Fair Use, and just about everything else

Landmark Fair Use Win

Yesterday, District Judge Harold Baer, Jr., handed down his decision in Authors Guild v. HathiTrust, a case that spins out of the long-running Google Books dispute. The decision is a landmark win for the HathiTrust, the University defendants, people with print-disabilities, Google, the Digital Humanities and, I would argue, for humanity in general.

Essential Background

The HathiTrust is a digital repository of millions scanned university library books that became available to various universities by virtue of the Google Books project.  About 3/4 of the books are still in copyright. In 2011 HathiTrust announced plans to embark on an innovative orphan works program (OWP), but dropped (or at least shelved) the plan soon after in light of criticism as to its implementation. Spurred into action by the OWP, in September 2011 the Authors Guild filed a copyright lawsuit against HathiTrust, five universities, and multiple university officials.

The Authors Guild suit alleged that library digitization for any purpose amounts to copyright infringement. The purposes specifically under attack in this case were (i) preservation; (ii) to enable non-expressive use such as conducting word searches; and (iii) to facilitating access by persons who are blind or visually impaired.

There is a key fact in this case that media reports will probably get wrong. This is not about scanning books to make extra copies for the public at large. As the Court explained, “No actual text from the book is revealed except to print-disabled library patrons at [University of Michigan].” Authors Guild v. HathiTrust, p 16. This case was about library digitization for three specific purposes, preservation, disabled access and non-expressive uses such as text searching and computational analysis.

The Score Card

Here is quick and dirty summary of the key copyright issues:

  • Digitization to provide access for the print-disabled held to be transformative use and, on balance, fair use.
  • Digitization to provide for print-disabled students held to be (i) an obligation of universities under the ADA, (ii) fair use under section 107 of the Copyright Act and (iii) enabled by section 121 of the Copyright Act.
  • Section 108 the Copyright Act was held to expand the rights of libraries, not limit the scope of their fair use rights in any way, shape or form. Given the text says “Nothing in this section . . . in any way affects the right of fair use as provided by section 107” any ruling to the contrary would have been pretty shocking.
  • Digitization to create a search index held to be a transformative use, and, on balance, fair use.
  • Alleged security risks created by library digitization — dismissed as speculative and unproven. The judge noted the strong evidence to the contrary. It is still an open question whether the risk of subsequent illegal act by a third party could ever render an initial lawful copy not fair use. The whole notion strikes me as rather odd.
  • The market effect of library digitization — the court found there was none to speak of in this case. The court rejected the CCC’s magic toll-booth arguments — i.e., there were some wild assertions about future licensing revenue that the court rejected as “conjecture”.
  • The court also notes that a copyright holder cannot preempt a transformative market merely by offering to license it.
  • The market effect of enabling print-disabled access to library books — the court found there was no market for this under-served group, nor was one likely to develop.

Did the authors Guild win anything?
Not really, but two issues could have been even worse.

  • The court held that the issue of the Orphan Works Program was not ripe for adjudication. This was inevitable in my opinion, but the judge could have added unfavorable dicta indicating that the AG had no case here either. Wisely, the judge said only what needed to be said.
  • On the issue of library digitization for the purpose of preservation, the court found that the argument that “preservation on its own is transformative is not strong.”

The Digital Humanities

The court appeared to accept the arguments in the Digital Humanities amicus brief, written by Matthew Jockers, Jason Schultz and myself with the assistance of many others. The brief extended arguments I made in Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal (forthcoming) and Copyright and Copy-Reliant Technology 103 Northwestern University Law Review 1607–1682 (2009).

Following Second Circuit precedent, the court explained that

“a transformative use may be one that actually changes the original work. However, a transformative use can also be one that serves an entirely different purpose.”

The court concluded that

“The use to which the works in the HDL are put is transformative because the copies serve an entirely different purpose than the original works: the purpose is superior search capabilities rather than actual access to copyrighted material. The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining.”

The court even cites an illustration from our brief!

“Mass digitization allows new areas of non-expressive computational and statistical research, … One example of text mining is research that compares the frequency with which authors used “is” to refer to the United States rather than “are” over time. See Digital Humanities Amicus Br. 7 (“[I]t was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation.”).”

Google Ngram Visualization Comparing Frequency of “The United States is” to “The United States are”

You can reconstruct the figure on Google Ngram yourself!

The court also cites our brief for the proposition that the use of metadata and text mining “could actually enhance the market for the underlying work, by causing researchers to revisit the original work and reexamine it in more detail”

Non-expressive use is fair use

The court did exactly what the amicus briefs urged it to do. As Matthew Jockers, Jason Schultz and I argued in our recent article in Nature last week (Digital Archives: Don’t Let Copyright Block Data Mining, 490 Nature 29-30 (October 4, 2012))

“It is time for the US courts to recognize explicitly that, in the digital age, copying books for non-expressive purposes is not infringement.”

Courts have already applied this logic in internet search engine cases and in a case involving plagiarism detection software. As we hoped, Judge Baer’s ruling demonstrates that digitization for text mining and other forms of computational analysis is, unequivocally, fair use.

“Plaintiffs assert that the decisions in Perfect 10 and Arriba Soft are distinguishable because in those cases the works were already available on the internet, … I fail to see why that is a difference that makes a difference.”

This was not a close case

“Although I recognize that the facts here may on some levels be without precedent, I am convinced that they fall safely within the protection of fair use such that there is no genuine issue of material fact. I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants’ MDP and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts that at the same time effectuates the ideals espoused by the ADA.”


A significant win for the National Federation for the Blind

My focus in this case has always been on the technological side, that is my academic interest. However,the most important issue in this case is not about search engines, the digital humanities or non-expressive use, it is about reading, humanity and expressive use. I am of course referring to those aspects of the decision relating to fair use and persons with disabilities.

“[m]aking a copy of a copyrighted work for the convenience of a blind person is expressly identified by the House Committee Report as an example of a fair use, with no suggestion that anything more than a purpose to entertain or to inform need motivate the copying.”

As Kenny Crews summarizes:

“The opinion provides a strong opinion about fair use as applied to serving persons with disabilities, especially when an educational institution is mandated to serve needs under the Americans With Disabilities Act.  The court goes further and resolves a long-time quandary that arose under Section 121 of the Copyright Act.  That statute permits an “authorized entity” to make formats of certain works available to persons who are visually impaired.  An “authorized entity” is one that has a “primary mission” to serve those needs.  Libraries and universities have many functions, so is that service a “primary mission”?  The court said yes.”


Some useful links:

Google Book Search: Digital Humanities still needs answers

Google has settled with the publishers, but not the Authors Guild. This is good news for the Digital Humanities because it means that we may still get a substantive ruling on the big fair use question underlying the entire litigation.

Human life is short, none of us can hope to read more than a smattering of the literary record, but fortunately massive digitization efforts like those undertaken by Google allow scholars to apply large-N computerized methods to millions of works. Computational and statistical analysis of literature will be a big part of humanities research for years to come. However, legal actions like those of the Authors Guild could bar scholars from studying as much as two-thirds of the literary record.

In a comment published in Nature today [paywall] [Nature Vol. 490, pages 29–30 (04 October 2012) doi:10.1038/490029a], Matthew Jockers (an English professor), Jason Schultz (a law professor) and myself (also a law professor) explain why the the Association for Computers and the Humanities and a large group of scholars chose to file an amicus curiae brief on behalf of the digital humanities in the Authors Guild v. Google and Authors Guild v. HathiTrust cases.

In the brief we explain why U.S. courts should recognize that copying books for non-expressive purposes is not infringement.

My view is that the settlement between Google and the publishers makes such a ruling more likely because it provides further evidence that the ability to make non-expressive uses of copyrighted books works hand in hand with the commercialization of expressive uses which is what copyright law is all about.

For more on this topic, see



more coverage of digital humanities amicus

I just read James Grimmelmann’s amusing and insightful post, Google Books: Even Friends of the Court Have Enemies. He concludes that “The opposition, overall, is a litigation tactic for the sake of tactics; I don’t see how it helps the plaintiffs either substantively or strategically.”

I won’t comment every time James says something worthwhile about the Google Books litigation, it happens far too often.

Authors Guild Unable to Silence Amici

The Judge presiding over Authors Guild v. Google granted leave to file for the Digital Humanities brief and an amicus brief by the American Library Association, the Association of College and Research Libraries, the Association of Research Libraries, and the Electronic Frontier Foundation. The Judge also ordered the Plaintiffs to respond to the amici curiae briefs by September 17, 2012 in a memorandum of law not to exceed 40 pages.

40 pages seems like quite a bit, so it should give the Authors Guild a chance to address all the case law they have conveniently ignored until now. This might be an indication that the court is taking the arguments of the amici seriously, or just that Judge Chin did not want to hear any morecause for compliant from the Authors Guild et al.

Oral argument on the motions for summary judgment is set proceed on October 9, 2012 at 10 AM. Oral argument on the motions for summary judgment shall proceed on December 4, 2012 at 2PM (this was order #4 on 2012-08-17).


Second Circuit to Hear Google’s Appeal of Class Action Certification

The order is here. It is likely that the rest of the case will be put on hold while this question is addressed, and yet the Hathitrust litigation is rolling on. The Authors Guild, which represents only 8500 authors, is trying to cash a billion dollar check (they hope) on behalf of a class of millions. Class action lawyers and copyright lawyers will be watching this case very closely.

Great panel at IPSC on orphan works, library digitization and fair use

In “The Orphans, the Market, and the Copyright DogmaAriel Katz notes that extended collective licensing (ECL) proposals will do nothing to solve the underlying orphan works problem. Like “Indulgences” ECL solutions merely absolves the “sin” of using works without permission, but actually does nothing to pay the absent owners.

In “How Fair Use Can Help Solve the Orphan Works ProblemJennifer Urban does a great job of explaining how the rest of us have under-analyzed the second fair use factor in relation to library digitization. She points out that in the Senate Report on the 1976 Copyright Act they say directly that market availability is part of the nature of the work.

In my own paper “Orphan Works as Grist for the Data Mill” I explain why copyright does not stand in the way of nonexpressive uses. My argument is that just as the distinction between expressive and nonexpressive works is well recognized. The same distinction should generally be made in relation to potential acts of infringement.

Copying for purely nonexpressive purposes, such as the automated extraction of data, should not be regarded as infringing.  Automated reproduction for nonexpressive uses (such as search engines, plagiarism detection, and macro-literary analysis) does not communicate the author’s original expression to the public, there is no expressive substitution, and thus there is no infringement. For more on Copyright and Copy-Reliant Technology, read my 2009 article of the same name.

What is at stake in Authors Guild v. Google; Authors Guild v. HathiTrust

Now that the Google Book Settlement is well and truly dead, attention is turning back to the underlying legal controversy. There are many issues in Authors Guild v. Google and the parallel case of Authors Guild v. HathiTrust, but the main one is simple. Does copying books so that computers can analyze them infringe copyright even if none ever reads that copy?

If the answer is yes, then, through the magic of class action law, the Authors Guild gets to sue Google for a minimum of $750 x several million books. Who would get these billions of dollars is unclear.

If the answer is no, then the Authors Guild would have to point to instances where Google has made a nontrivial portion of a book available to the public without permission of justification such as fair use. There might be one or two of these, but I think Google won’t loose sleep about statutory damages for a handful of books.

I recently wrote an amicus brief, along with Matthew Jockers (Assistant Professor of English at the University of Nebraska, Lincoln) and Jason Schultz (Assistant Clinical Professor of Law; Faculty Co-Director, Samuelson Law, Technology & Public Policy Clinic), arguing that such non-expressive is use fair use. I.e., that text-mining is not copyright infringement.

More than 60 professors and researchers in the digital humanities joined our brief because, as we said:

“If libraries, research universities, non-profit organizations, and commercial entities like Google are prohibited from making non-expressive use of copyrighted material, literary scholars, historians, and other humanists are destined to become 19th-centuryists; slaves not to history, but to the public domain. History does not end in 1923. But if copyright law prevents Digital Humanities scholars from using more recent materials, that is the effective end date of the work these scholars can do.”

This is what is at stake.