HathiTrust Wins on Fair Use, and just about everything else

Landmark Fair Use Win

Yesterday, District Judge Harold Baer, Jr., handed down his decision in Authors Guild v. HathiTrust, a case that spins out of the long-running Google Books dispute. The decision is a landmark win for the HathiTrust, the University defendants, people with print-disabilities, Google, the Digital Humanities and, I would argue, for humanity in general.

Essential Background

The HathiTrust is a digital repository of millions scanned university library books that became available to various universities by virtue of the Google Books project.  About 3/4 of the books are still in copyright. In 2011 HathiTrust announced plans to embark on an innovative orphan works program (OWP), but dropped (or at least shelved) the plan soon after in light of criticism as to its implementation. Spurred into action by the OWP, in September 2011 the Authors Guild filed a copyright lawsuit against HathiTrust, five universities, and multiple university officials.

The Authors Guild suit alleged that library digitization for any purpose amounts to copyright infringement. The purposes specifically under attack in this case were (i) preservation; (ii) to enable non-expressive use such as conducting word searches; and (iii) to facilitating access by persons who are blind or visually impaired.

There is a key fact in this case that media reports will probably get wrong. This is not about scanning books to make extra copies for the public at large. As the Court explained, “No actual text from the book is revealed except to print-disabled library patrons at [University of Michigan].” Authors Guild v. HathiTrust, p 16. This case was about library digitization for three specific purposes, preservation, disabled access and non-expressive uses such as text searching and computational analysis.

The Score Card

Here is quick and dirty summary of the key copyright issues:

  • Digitization to provide access for the print-disabled held to be transformative use and, on balance, fair use.
  • Digitization to provide for print-disabled students held to be (i) an obligation of universities under the ADA, (ii) fair use under section 107 of the Copyright Act and (iii) enabled by section 121 of the Copyright Act.
  • Section 108 the Copyright Act was held to expand the rights of libraries, not limit the scope of their fair use rights in any way, shape or form. Given the text says “Nothing in this section . . . in any way affects the right of fair use as provided by section 107” any ruling to the contrary would have been pretty shocking.
  • Digitization to create a search index held to be a transformative use, and, on balance, fair use.
  • Alleged security risks created by library digitization — dismissed as speculative and unproven. The judge noted the strong evidence to the contrary. It is still an open question whether the risk of subsequent illegal act by a third party could ever render an initial lawful copy not fair use. The whole notion strikes me as rather odd.
  • The market effect of library digitization — the court found there was none to speak of in this case. The court rejected the CCC’s magic toll-booth arguments — i.e., there were some wild assertions about future licensing revenue that the court rejected as “conjecture”.
  • The court also notes that a copyright holder cannot preempt a transformative market merely by offering to license it.
  • The market effect of enabling print-disabled access to library books — the court found there was no market for this under-served group, nor was one likely to develop.

Did the authors Guild win anything?
Not really, but two issues could have been even worse.

  • The court held that the issue of the Orphan Works Program was not ripe for adjudication. This was inevitable in my opinion, but the judge could have added unfavorable dicta indicating that the AG had no case here either. Wisely, the judge said only what needed to be said.
  • On the issue of library digitization for the purpose of preservation, the court found that the argument that “preservation on its own is transformative is not strong.”

The Digital Humanities

The court appeared to accept the arguments in the Digital Humanities amicus brief, written by Matthew Jockers, Jason Schultz and myself with the assistance of many others. The brief extended arguments I made in Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal (forthcoming) and Copyright and Copy-Reliant Technology 103 Northwestern University Law Review 1607–1682 (2009).

Following Second Circuit precedent, the court explained that

“a transformative use may be one that actually changes the original work. However, a transformative use can also be one that serves an entirely different purpose.”

The court concluded that

“The use to which the works in the HDL are put is transformative because the copies serve an entirely different purpose than the original works: the purpose is superior search capabilities rather than actual access to copyrighted material. The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining.”

The court even cites an illustration from our brief!

“Mass digitization allows new areas of non-expressive computational and statistical research, … One example of text mining is research that compares the frequency with which authors used “is” to refer to the United States rather than “are” over time. See Digital Humanities Amicus Br. 7 (“[I]t was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation.”).”

Google Ngram Visualization Comparing Frequency of “The United States is” to “The United States are”

You can reconstruct the figure on Google Ngram yourself!

The court also cites our brief for the proposition that the use of metadata and text mining “could actually enhance the market for the underlying work, by causing researchers to revisit the original work and reexamine it in more detail”

Non-expressive use is fair use

The court did exactly what the amicus briefs urged it to do. As Matthew Jockers, Jason Schultz and I argued in our recent article in Nature last week (Digital Archives: Don’t Let Copyright Block Data Mining, 490 Nature 29-30 (October 4, 2012))

“It is time for the US courts to recognize explicitly that, in the digital age, copying books for non-expressive purposes is not infringement.”

Courts have already applied this logic in internet search engine cases and in a case involving plagiarism detection software. As we hoped, Judge Baer’s ruling demonstrates that digitization for text mining and other forms of computational analysis is, unequivocally, fair use.

“Plaintiffs assert that the decisions in Perfect 10 and Arriba Soft are distinguishable because in those cases the works were already available on the internet, … I fail to see why that is a difference that makes a difference.”

This was not a close case

“Although I recognize that the facts here may on some levels be without precedent, I am convinced that they fall safely within the protection of fair use such that there is no genuine issue of material fact. I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants’ MDP and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts that at the same time effectuates the ideals espoused by the ADA.”

 

A significant win for the National Federation for the Blind

My focus in this case has always been on the technological side, that is my academic interest. However,the most important issue in this case is not about search engines, the digital humanities or non-expressive use, it is about reading, humanity and expressive use. I am of course referring to those aspects of the decision relating to fair use and persons with disabilities.

“[m]aking a copy of a copyrighted work for the convenience of a blind person is expressly identified by the House Committee Report as an example of a fair use, with no suggestion that anything more than a purpose to entertain or to inform need motivate the copying.”

As Kenny Crews summarizes:

“The opinion provides a strong opinion about fair use as applied to serving persons with disabilities, especially when an educational institution is mandated to serve needs under the Americans With Disabilities Act.  The court goes further and resolves a long-time quandary that arose under Section 121 of the Copyright Act.  That statute permits an “authorized entity” to make formats of certain works available to persons who are visually impaired.  An “authorized entity” is one that has a “primary mission” to serve those needs.  Libraries and universities have many functions, so is that service a “primary mission”?  The court said yes.”

 

Some useful links:

Google Book Search: Digital Humanities still needs answers

Google has settled with the publishers, but not the Authors Guild. This is good news for the Digital Humanities because it means that we may still get a substantive ruling on the big fair use question underlying the entire litigation.

Human life is short, none of us can hope to read more than a smattering of the literary record, but fortunately massive digitization efforts like those undertaken by Google allow scholars to apply large-N computerized methods to millions of works. Computational and statistical analysis of literature will be a big part of humanities research for years to come. However, legal actions like those of the Authors Guild could bar scholars from studying as much as two-thirds of the literary record.

In a comment published in Nature today [paywall] [Nature Vol. 490, pages 29–30 (04 October 2012) doi:10.1038/490029a], Matthew Jockers (an English professor), Jason Schultz (a law professor) and myself (also a law professor) explain why the the Association for Computers and the Humanities and a large group of scholars chose to file an amicus curiae brief on behalf of the digital humanities in the Authors Guild v. Google and Authors Guild v. HathiTrust cases.

In the brief we explain why U.S. courts should recognize that copying books for non-expressive purposes is not infringement.

My view is that the settlement between Google and the publishers makes such a ruling more likely because it provides further evidence that the ability to make non-expressive uses of copyrighted books works hand in hand with the commercialization of expressive uses which is what copyright law is all about.

For more on this topic, see https://matthewsag.com/projects/google-book-copyright-the-digital-humanities/

 

 

Global Research Network on Copyright Flexibilities in National Legal Reform Meeting in DC

I am in DC today at the Global Research Network on Copyright Flexibilities in National Legal Reform Meeting.

Copyright reform is under active discussion at the national level in numerous countries. The goal of the Global Research Network on Copyright Flexibilities in National Legal Reform is to produce draft language for a flexible limitation and exception that could be included in national legislation. We expect to offer this language, which may include more than one model provision, to legislators and civil society advocates in countries contemplating copyright reform. Additionally, we aim to develop an online “tool kit” to assist these deliberations.

Our Robot Overlords

There is a great story today on io9.com illustrating just why automatic copyright filtering can never be a complete solution to online copyright issues. In short,

Dumb robots, programmed to kill any broadcast containing copyrighted material, had destroyed the only live broadcast of the Hugo Awards.

Apparently, a licensed clip from Dr. Who (which would have been fair use even if it had not been licensed) triggered the filtering software and exterminated the webcast. Companies like Ustream are of course free to implement whatever dumb software they like, but if filtering becomes the norm we will all be subject to prior restraint by mindless automatons. I, for one, do not welcome our new robot overlords.

Australia, Copyright and the Digital Economy

The Australian Law Reform Commission has just published a thought provoking issues paper on Copyright and the Digital Economy, ALRC Issues Paper 42, August 2012.

The ALRC has been charged with considering whether existing exceptions under Australian law are appropriate and whether further exceptions should recognize fair use of copyright material; allow transformative, innovative and collaborative use of copyright materials to create and deliver new products and services of public benefit; and allow appropriate access, use, interaction and production of copyright material online for social, private or domestic purposes.

This is not the first time that Australia has considered adopting a more open-ended approach to copyright limitations and exception. The 2005 “Fair Use Review” by the Attorney-General’s Department also looked at the appropriateness of introducing a general fair use exception. That review led to some piecemeal reforms, but left Australia with its complicated labyrinth of exceptions keyed to particular uses of particular types of works under particular circumstances, all subject to a balancing test not unlike the U.S. fair use doctrine.

The new ALRC report makes some interesting observations about how things have changed since the last time this issue was considered.

At page 78-79, the ALRC notes …

“There has been a noticeable degree of change with respect to technology and social uses of it, even since the Fair Use Review. In its preliminary discussions with some stakeholders and others with an interest in copyright, the ALRC heard that there may now be more of an appetite for a broad, flexible exception to copyright—perhaps based on US-style fair use—than in late 2006.

In January 2008, Barton Beebe’s empirical study of US fair use case law through to the year 2005 was published. (B Beebe, ‘An Empirical Study of US Copyright Fair Use Opinions, 1978–2005’ (2008) 156 University of Pennsylvania Law Review 549). He argued that the results ‘show that much of our conventional wisdom about that case law is mistaken’.

In 2009, [Pamela] Samuelson published her ‘qualitative assessment’ of the fair use case law, which was built upon Beebe’s study (P Samuelson, ‘Unbundling Fair Uses’ (2009) 77 Fordham Law Review 2537). Samuelson has argued that ‘fair use is both more coherent and more predictable than many commentators have perceived once one recognizes that fair use cases tend to fall into common patterns’.

Earlier in 2012, Matthew Sag published his work that built upon these two studies (M Sag, ‘Predicting Fair Use’ (2012) 73 Ohio State Law Journal 47).  He went further than Samuelson and ‘assesse[d] the predictability of fair use in terms of case facts which exist prior to any judicial determination’. He argued that his work demonstrates that “the uncertainty critique is somewhat overblown: an empirical analysis of the case law shows that, while there are many shades of gray in fair use litigation, there are also consistent patterns that can assist individuals, businesses, and lawyers in assessing the merits of particular claims to fair use protection.”

In my view, if Australian companies are going to have a fair chance to compete in the global digital economy, Australia needs to adopt a more flexible approach to copyright exceptions and limitations. When new issues arise in the United States, they are dealt with by the courts. Litigation is far from perfect, but it beats waiting around for a slow-moving, special-interest-beholden, narrowly focused legislative process. More often than not,  Australian copyright law tends to echo the results of United States cases, just with a significant delay that limits the innovation opportunities of Australian companies.

Without a fair use doctrine, Australian innovators need to wait for permission regardless of how fair their intended use might be. In contrast, their American counterparts can back their own judgement as to fair use, and ultimately, if necessary, defend that judgment in court.

Online submissions to the ALRC can be made here. A final Report will then be delivered by 30 November 2013.

Authors Guild Unable to Silence Amici

The Judge presiding over Authors Guild v. Google granted leave to file for the Digital Humanities brief and an amicus brief by the American Library Association, the Association of College and Research Libraries, the Association of Research Libraries, and the Electronic Frontier Foundation. The Judge also ordered the Plaintiffs to respond to the amici curiae briefs by September 17, 2012 in a memorandum of law not to exceed 40 pages.

40 pages seems like quite a bit, so it should give the Authors Guild a chance to address all the case law they have conveniently ignored until now. This might be an indication that the court is taking the arguments of the amici seriously, or just that Judge Chin did not want to hear any morecause for compliant from the Authors Guild et al.

Oral argument on the motions for summary judgment is set proceed on October 9, 2012 at 10 AM. Oral argument on the motions for summary judgment shall proceed on December 4, 2012 at 2PM (this was order #4 on 2012-08-17).

 

Is Fair Use a Viable Defense for Stolen Photos?

A majority of the Ninth Circuit Court of Appeals does not seem to think so, thus raising the question of the extent to which copyright allows newsworthy public figures to control their images in the press.

Singer/model Noelia Lorenzo Monge secretly married her manager and music producer Jorge Reynoso in a Las Vegas chapel in 2007. The couple kept their marriage a secret, even from their parents, for two years until a memory chip in a borrowed car found its way to the TVNotas magazine. The magazine published six of the stolen photos—three of the wedding ceremony and three of the wedding night.

The majority decision in Monge v. Maya Magazines, 2012 U.S. App. LEXIS 16947 (9th Cir. Aug. 14, 2012) takes a narrow view of the scope of fair use. Compared to the district court, the majority takes a stronger view on the right of first publication and a less generous view on the significance of news reporting:

“The tantalizing and even newsworthy interest in the photos does not trump a balancing of the fair use factors”

The majority dismissed the defendant’s claim to transformative use and distinguished the similar case of Núñez on the basis that in Núñez “the pictures were the story, and the newspaper in Núñez did not seek to manufacture newsworthiness, nor did it scoop the story.” The majority suggested that whereas one photo as evidence of the event may have been fair use, publishing six “undoubtedly supplanted Plaintiffs’ right to control the first public appearance of the photographs.”

The dissent saw matters a little differently:

The majority contends that the public interest in a free press cannot trump a celebrity’s right to control his image and works in the media—even if that celebrity has publicly controverted the very subject matter of the works at issue. Under the majority’s analysis, public figures could invoke copyright protection to prevent the media’s disclosure of any embarrassing or incriminating works by claiming that such images were intended only for private use.

 

The implications of this analysis undermine the free press and eviscerate the principles upon which copyright was founded. Although newsworthiness alone is insufficient to invoke fair use, public figures should not be able to hide behind the cloak of copyright to prevent the news media from exposing their fallacies.

Journalists and Fair Use

A new study by Patricia Aufderheide, Peter Jaszi, Katie Bieze and Jan Lauren Boyles, explores the problems that journalists face engaging with copyright and employing the doctrine of fair use. The study was based on open-ended interviews with 80 journalists across a range of media platforms.

The good news is that in situations that journalists have been dealing with for years, their collective intuitions about good practice map pretty closely onto the fair use case law. This is striking because the journalists appear to know very little about copyright law of the ins and outs of fair use. Sometimes the journalists quoted in the study were comically wrong:

“When somebody dies, … it’s in the public domain” … “If you can find it on the Web, then anybody can use it, and anybody can take it.”

So how do journalists get it right? The journalists mission to report the news, space constraints, norms of attribution and originality all lead journalists to seek to use their source material transformatively and limit that use to what is necessary. As the authors note:

“They routinely asked themselves if they were merely appropriating information in order to avoid work, or whether they were repurposing that information in a way matched to their mission to inform the public.”

More troubling is finding that in new media situations journalists did not understand their fair use rights. In this context,

“interviewees were often unable to make a timely decision or justify it to a gatekeeper. They operated from risk analysis, without knowledge of actual risk or of their actual rights.”

The study finds when in doubt, journalists routinely self-censor, causing delays, increasing costs, and even failing their journalistic mission.

No Injunction in Cambridge University Press v. Becker, the Georgia State e-reserves case

Back Story

There is an excellent summary of the district court’s decision from May 2012 over at James Grimmelmann’s blog.

The abbreviated story is that in April 2008 three publishers (Cambridge University Press, Oxford University Press, and SAGE Publishers) brought suit against Georgia State in relation to the school’s electronic reserve policy. The suit was backed by the Association of American Publishers and the Copyright Clearance Center (CCC), a licensing company.

This was a complicated case, not least because the publishers had a surprisingly hard time showing they owned the relevant copyrights. Judge Orinda Evans of the U.S. District Court in Atlanta handed down a 350 page decision in May. Of all the books the plaintiffs said were infringed, many did not even make it to a fair use analysis because the only copying that had taken place was to confirm the files were on reserve.

In terms of fair use, the highlights of the decision were:

(1) The educational purpose of the use favored the defendant.
(2) The informational nature of the use favored the defendant.
(3) The third fair use factor, the amount and substantiality of the portion copied, turned out to be quite interesting.

  • The court rejected the Classroom Guidelines as incompatible “with the language and intent of § 107” and suggested its own quantitative test: 10% of the total page count for books of nine chapters or less and one chapter for longer books. Going above this limit is not fatal to the defendant, but staying below is highly favorable.

(4) In terms of the fourth factor, the effect of the defendant’s use on the on the market for/value of the plaintiff’s work, the court found that this favored the plaintiffs digital licensing was available through the CCC.

  • But in the majority of cases the 4th factor still favored Georgia State because there was “no evidence in the record to show that digital excerpts from this book were available for licensing” as of the date of infringement.” Note that photocopying licenses were not seen to be a close substitute for digital reserve licenses. Another important piece of context here is that students would not have bought the assigned books as a substitute for the excerpts posted on the e-reserve system. Thus, no harm, no foul. Consistent with a “market failure” analysis, in the context of orphan works this suggests a broad scope for fair use.

Only five out of 74 of course reserve listings fell on the wrong side of this fair use analysis

The Update – No Injunction

As reported in the Chronicle of Higher Education, on Friday afternoon, Judge Evans issued an order denying the plaintiffs’ request for injunctive and declaratory relief. The only remedy the publishers got was an order that the defendant’s fine-tune their copyright policy to make it “not inconsistent” with the Judge’s ruling.

The Judge determined that Georgia State University was, on balance, the prevailing party (they won 69:5 after all), and was thus entitled to “reasonable attorney’s fees”.

This is a massive vindication for Georgia State University, the institution was depicted as copyright outlaw by the plaintiffs, but the court was “convinced that defendants did try to comply with the copyright laws,” and mostly succeeded. It is also further real world evidence that with the right legal advice, fair use can be somewhat predicable. See my article on Predicting Fair Use for an empirical study to this effect.

Great panel at IPSC on orphan works, library digitization and fair use

In “The Orphans, the Market, and the Copyright DogmaAriel Katz notes that extended collective licensing (ECL) proposals will do nothing to solve the underlying orphan works problem. Like “Indulgences” ECL solutions merely absolves the “sin” of using works without permission, but actually does nothing to pay the absent owners.

In “How Fair Use Can Help Solve the Orphan Works ProblemJennifer Urban does a great job of explaining how the rest of us have under-analyzed the second fair use factor in relation to library digitization. She points out that in the Senate Report on the 1976 Copyright Act they say directly that market availability is part of the nature of the work.

In my own paper “Orphan Works as Grist for the Data Mill” I explain why copyright does not stand in the way of nonexpressive uses. My argument is that just as the distinction between expressive and nonexpressive works is well recognized. The same distinction should generally be made in relation to potential acts of infringement.

Copying for purely nonexpressive purposes, such as the automated extraction of data, should not be regarded as infringing.  Automated reproduction for nonexpressive uses (such as search engines, plagiarism detection, and macro-literary analysis) does not communicate the author’s original expression to the public, there is no expressive substitution, and thus there is no infringement. For more on Copyright and Copy-Reliant Technology, read my 2009 article of the same name.