HathiTrust Wins on Fair Use, and just about everything else

Landmark Fair Use Win

Yesterday, District Judge Harold Baer, Jr., handed down his decision in Authors Guild v. HathiTrust, a case that spins out of the long-running Google Books dispute. The decision is a landmark win for the HathiTrust, the University defendants, people with print-disabilities, Google, the Digital Humanities and, I would argue, for humanity in general.

Essential Background

The HathiTrust is a digital repository of millions scanned university library books that became available to various universities by virtue of the Google Books project.  About 3/4 of the books are still in copyright. In 2011 HathiTrust announced plans to embark on an innovative orphan works program (OWP), but dropped (or at least shelved) the plan soon after in light of criticism as to its implementation. Spurred into action by the OWP, in September 2011 the Authors Guild filed a copyright lawsuit against HathiTrust, five universities, and multiple university officials.

The Authors Guild suit alleged that library digitization for any purpose amounts to copyright infringement. The purposes specifically under attack in this case were (i) preservation; (ii) to enable non-expressive use such as conducting word searches; and (iii) to facilitating access by persons who are blind or visually impaired.

There is a key fact in this case that media reports will probably get wrong. This is not about scanning books to make extra copies for the public at large. As the Court explained, “No actual text from the book is revealed except to print-disabled library patrons at [University of Michigan].” Authors Guild v. HathiTrust, p 16. This case was about library digitization for three specific purposes, preservation, disabled access and non-expressive uses such as text searching and computational analysis.

The Score Card

Here is quick and dirty summary of the key copyright issues:

  • Digitization to provide access for the print-disabled held to be transformative use and, on balance, fair use.
  • Digitization to provide for print-disabled students held to be (i) an obligation of universities under the ADA, (ii) fair use under section 107 of the Copyright Act and (iii) enabled by section 121 of the Copyright Act.
  • Section 108 the Copyright Act was held to expand the rights of libraries, not limit the scope of their fair use rights in any way, shape or form. Given the text says “Nothing in this section . . . in any way affects the right of fair use as provided by section 107” any ruling to the contrary would have been pretty shocking.
  • Digitization to create a search index held to be a transformative use, and, on balance, fair use.
  • Alleged security risks created by library digitization — dismissed as speculative and unproven. The judge noted the strong evidence to the contrary. It is still an open question whether the risk of subsequent illegal act by a third party could ever render an initial lawful copy not fair use. The whole notion strikes me as rather odd.
  • The market effect of library digitization — the court found there was none to speak of in this case. The court rejected the CCC’s magic toll-booth arguments — i.e., there were some wild assertions about future licensing revenue that the court rejected as “conjecture”.
  • The court also notes that a copyright holder cannot preempt a transformative market merely by offering to license it.
  • The market effect of enabling print-disabled access to library books — the court found there was no market for this under-served group, nor was one likely to develop.

Did the authors Guild win anything?
Not really, but two issues could have been even worse.

  • The court held that the issue of the Orphan Works Program was not ripe for adjudication. This was inevitable in my opinion, but the judge could have added unfavorable dicta indicating that the AG had no case here either. Wisely, the judge said only what needed to be said.
  • On the issue of library digitization for the purpose of preservation, the court found that the argument that “preservation on its own is transformative is not strong.”

The Digital Humanities

The court appeared to accept the arguments in the Digital Humanities amicus brief, written by Matthew Jockers, Jason Schultz and myself with the assistance of many others. The brief extended arguments I made in Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal (forthcoming) and Copyright and Copy-Reliant Technology 103 Northwestern University Law Review 1607–1682 (2009).

Following Second Circuit precedent, the court explained that

“a transformative use may be one that actually changes the original work. However, a transformative use can also be one that serves an entirely different purpose.”

The court concluded that

“The use to which the works in the HDL are put is transformative because the copies serve an entirely different purpose than the original works: the purpose is superior search capabilities rather than actual access to copyrighted material. The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining.”

The court even cites an illustration from our brief!

“Mass digitization allows new areas of non-expressive computational and statistical research, … One example of text mining is research that compares the frequency with which authors used “is” to refer to the United States rather than “are” over time. See Digital Humanities Amicus Br. 7 (“[I]t was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation.”).”

Google Ngram Visualization Comparing Frequency of “The United States is” to “The United States are”

You can reconstruct the figure on Google Ngram yourself!

The court also cites our brief for the proposition that the use of metadata and text mining “could actually enhance the market for the underlying work, by causing researchers to revisit the original work and reexamine it in more detail”

Non-expressive use is fair use

The court did exactly what the amicus briefs urged it to do. As Matthew Jockers, Jason Schultz and I argued in our recent article in Nature last week (Digital Archives: Don’t Let Copyright Block Data Mining, 490 Nature 29-30 (October 4, 2012))

“It is time for the US courts to recognize explicitly that, in the digital age, copying books for non-expressive purposes is not infringement.”

Courts have already applied this logic in internet search engine cases and in a case involving plagiarism detection software. As we hoped, Judge Baer’s ruling demonstrates that digitization for text mining and other forms of computational analysis is, unequivocally, fair use.

“Plaintiffs assert that the decisions in Perfect 10 and Arriba Soft are distinguishable because in those cases the works were already available on the internet, … I fail to see why that is a difference that makes a difference.”

This was not a close case

“Although I recognize that the facts here may on some levels be without precedent, I am convinced that they fall safely within the protection of fair use such that there is no genuine issue of material fact. I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants’ MDP and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts that at the same time effectuates the ideals espoused by the ADA.”

 

A significant win for the National Federation for the Blind

My focus in this case has always been on the technological side, that is my academic interest. However,the most important issue in this case is not about search engines, the digital humanities or non-expressive use, it is about reading, humanity and expressive use. I am of course referring to those aspects of the decision relating to fair use and persons with disabilities.

“[m]aking a copy of a copyrighted work for the convenience of a blind person is expressly identified by the House Committee Report as an example of a fair use, with no suggestion that anything more than a purpose to entertain or to inform need motivate the copying.”

As Kenny Crews summarizes:

“The opinion provides a strong opinion about fair use as applied to serving persons with disabilities, especially when an educational institution is mandated to serve needs under the Americans With Disabilities Act.  The court goes further and resolves a long-time quandary that arose under Section 121 of the Copyright Act.  That statute permits an “authorized entity” to make formats of certain works available to persons who are visually impaired.  An “authorized entity” is one that has a “primary mission” to serve those needs.  Libraries and universities have many functions, so is that service a “primary mission”?  The court said yes.”

 

Some useful links:

One thought on “HathiTrust Wins on Fair Use, and just about everything else

Comments are closed.