google book, copyright & the digital humanities

Google Book Search – What is it?

Google entered the world of library digitization in 2004 when it began scanning and digitizing the collections of a number of prestigious private and public academic libraries to make their contents searchable in the same way it makes Internet websites searchable. In response to a search query, the Google Book search engine typically displays three-line “snippets” from selected books – just enough to indicate to the searcher whether the text was really responsive to their search term. Google does not need permission to digitize works in the public domain and the company has also obtained permission from several publishers to include their works in the Google Book search engine under agreed terms. However, the Google is also digitizing millions of in-copyright works without the prior authorization of the relevant copyright owners, and therein lays the core of the dispute.

Google Book Search Litigation

Google has been mired in copyright litigation regarding it library digitization project since 2005 when it was sued by the Authors Guild in a class action on behalf of all authors. A controversial settlement of that class action proposed in 2008 generated a maelstrom of objections. The settlement was revised in 2009, but ultimately reject by Judge Deny Chin in the Southern District of New York in March 2011.

In September 2011, the Authors Guild filed claims for copyright infringement against the universities of Michigan, California, Wisconsin, Indiana and Cornell University for participating in Google Book. The Guild’s complaint with respect to the universities is, first, that they allowed Google to digitize their library collections, and second, that the universities accepted corresponding digital files from Google and have consolidated those files into a shared digital repository known as the HathiTust digital library. The HathiTust service enables a large collection of universities and research libraries to store, secure and search their digital collections using a shared infrastructure, it does not make the contents of individual in-copyright books available to the general public.

Copyright and non-expressive use

The Authors Guilds case is misguided. It is true that copyright owner permission would usually be required to scan a library book and then display a significant part of the contents to an end-user, although if the book is an orphan work, even that might be fair use. But to copy a work as an intermediate step in making a non-infringing end product, in this case a search engine, is not copyright infringement.

Our usual rules and intuitions about copying are based on mechanical technologies like the printing press and the photocopier. It is a fair inference that somebody would only photocopy a journal article in order to read it later, or at least to have the option of reading it later. Presumably, no one would photocopy War and Peace just to listen to hum of the copy machine.

However, copy-reliant technologies, such as Internet search engines and plagiarism detection software, are different. They do not read, understand, or enjoy copyrighted works, nor do they make these works available to the public. Instead, they copy text in order to process them as grist for the mill, raw materials that feed various algorithms and indices. Courts have consistently found such non-expressive uses do not violate copyright.

Copyright protects expression at the point of human consumption. Even where machine copying is rampant and indiscriminate, the non-expressive use of text by machines does not, by itself, infringe copyright.

Brief of Digital Humanities and Law Scholars as Amici Curiae

Matthew Jockers, Jason Schultz and I have written three (and counting?) amicus briefs in the google book litigation. The most recent is was filed in Authors Guild v. Hathitrust (Second Circuit Court of Appeals, filed June 4, 2013) (download http://ssrn.com/abstract=2274832). We also filed similar briefs in Authors Guild v. Google in the District Court in 2012 and in Authors Guild v. Hathitrust (District Court, 2012)

Relevant Articles and Presentations

  • Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal 2012 (download from ssrn) (slides)
  • The Google Book Settlement and the Fair Use Counter-factual (download from ssrn)
  • Copyright and Copy-Reliant Technology 103 Northwestern University Law Review 1607–1682 (2009) (download from ssrn)

Other Appeal Briefs in Authors Guild v. Hathitrust (Second Circuit)

Selected documents in Authors Guild v. Google:

Over a 1000 documents have been filed in this case, Thepublicindex has most of them. In addition …

Key Documents in the Hathitrust litigation (links via thepublicindex.org):