Digital Humanities and Legal Scholars in Authors Guild v. Google filed

On Thursday this week, we filed a brief on behalf over 150 researchers, scholars and educators in Authors Guild v. Google, currently on appeal to the Second Circuit Court of Appeals.
The Brief of Digital Humanities and Legal Scholars argues that Copyright law is not, and should not be, an obstacle to the computational analysis of text. Copyright law has long recognized the distinction between protecting an author’s original expression and the public’s right to access the facts and ideas contained within that expression.
We are confident that the Second Circuit will vote to maintain that distinction in the digital age so that library digitization, internet search and related non-expressive uses of written works remain legal.
The final version of the brief is available on the free online repository at this link address:
We are grateful for the support of so many wonderful scholars in this important case and we are even more grateful for all the fascinating research that these computer scientists, english professors, historians, linguists, and all those working in the digital humanities do to enrich our lives.
We would also like to thank The Association for Computers and the Humanities and the Canadian Society of Digital Humanities/Société canadienne des humanités numériques for their support as institutions.
Matthew Jockers
Matthew Sag
Jason Schultz

Archives & Copyright: Developing An Agenda For Reform starts tomorrow #dh #archivescopyright

Archives & Copyright: Developing An Agenda For Reform

This is a one day symposium, co-organised by CREATe and the Wellcome Library. The symposium considers forthcoming changes to the copyright regime in the UK as it impacts the work of archives, as well as the role that risk-management plays in copyright compliance for archival digitization projects.

I will be speaking on a panel along with Professors Peter Jaszi and Peter Hirtle. We will discuss how cultural heritage institutions in the US work with copyright law, and in particular the ongoing Authors Guild v. HathiTrust case (currently on appeal).

I plan to talk about my experience bringing together (along with Jason Schultz and Matthew Jockers) the digital humanities amicus briefs for Authors Guild v. Hathi Trust I and II and Authors Guild v. Google. My slides are available right here.

The #hashtag for the symposium is #archivescopyright

A Collection of Briefs in Authors Guild v. HathiTrust

I have collected all the briefs in Authors Guild v. Hathitrust for anyone who is interested.

The leading number refers to the court docket. There are some briefs in support of the plaintiffs, but the majority are in support of the defendants.

You can download the whole set as a zip file (26MG) here: AG v. Ht Appeal Briefs as filed 2013 …

Or individually from the links below:










Please Support the Digital Humanities Amicus Brief in AuthorsGuild v. Hathitrust

Call for Support
We are seeking your support for our amicus brief in the Court of Appeals in Authors Guild v. Hathitrust. We believe that this case will have a dramatic effect on research in computer science to linguistics, history, literature and the digital humanities.


In 2005, the Authors Guild, a lobby group with about 8,500 members including published authors, literary agents and lawyers, filed a class-action lawsuit claiming that Google’s library digitization project was a “massive copyright infringement”. A settlement was proposed in that case in 2008, modified after strenuous objections from academics, other author groups and several foreign governments in 2009 and rejected by the court in 2011.In September 2011, in a separate case, the Authors Guild sued several universities and the HathiTrust for participating in Google’s book-scanning project. On July 7, 2012 the Association for Computers and the Humanities and more than 60 scholars from disciplines ranging from law and computer science to linguistics, history and literature, filed an amicus curiae brief in Authors Guild v HathiTrust on behalf of the digital humanities.

District Court Decision

On October 10, 2012, Judge Baer (Southern District of New York) ruled against the Authors Guild and their fellow plaintiffs and held that the library digitization for uses such as text-mining are “transformative” as that term of art is used in copyright law and, on balance, fair use (i.e., not copyright infringement). Judge Baer’s opinion cites our amicus brief, adopts one of our examples and appears to follow the basic structure of our legal argument.


The Authors Guild is now appealing Judge Baer’s decision (on this and other grounds) and we would like your support in drafting a new brief for the U.S. Court of Appeals for the Second Circuit.

Argument in a nutshell

According to the U.S. Constitution, the purpose of copyright is “To promote the Progress of Science and useful Arts”. Copyright law should not be an obstacle to statistical and computational analysis of the millions of books owned by university libraries. Copyright law has long recognized the distinction between protecting an author’s original expression and the public’s right to access the facts and ideas contained within that expression. That distinction must be maintained in the digital age so that library digitization, internet search and related non-expressive uses of written works remain legal.

Draft available on request

Email for a full draft, or download our previous effort (in related district court litigation) here. The final brief will be very similar to this. 

How you can help preserve the balance of copyright law
(1) You can let us know that you would like to join our brief (we need your name and affiliation e.g. Associate Professor, Jane Doe, Springfield University). We would also like to add a one line description of any aspect of your work that is relevant to the brief, e.g. ___ Grant to study ___ in ____ literary corpus or a relevant publication.
Please note that ours is not the only amicus brief being filed in this case. Jennifer Urban (U.C. Berkeley) will also be filing a brief on arguing that the plaintiffs do not represent the interests of academic authors who comprise a large proportion of the class. YOU CAN’T SIGN BOTH. Please consider endorsing whichever brief speaks most closely to your concerns as an academic.
We need your name etc., by June 3, 2013. Please email or enter your details directly via this online tool.
(2) You can point us toward easy to understand and compelling examples of the kind of research enabled by mass-digitization (we can’t include all your wonderful work, but we would like to understand it better).
(3) You can send this link to other academics and Phd students.
Thank you!
Matthew Sag, Matthew Jockers and Jason Schultz

Fair use and (non-)compliance with other statutory exceptions (Authors Guild v. Hathitrust)

Introduction and Necessary Disclaimer 

This the last of a series of posts commenting on the Authors Guild Appeal Brief (February 25, 2013) in Authors Guild v. Hathitrust. The views expressed on this site are purely my own.

Has the Authors Guild Discovered a new fair use factor? 

The plaintiffs in Authors Guild v. HathiTrust had a lot to say about Section 108 of the Copyright in when this case was in the district court. Section 108 gives libraries the right to make a limited number of copies of certain works for specified purposes.  Section 108(f)(4)) explicitly states that “[n]othing in this section. . . in any way affects the right of fair use as provided by section 107”, nonetheless the plaintiffs argued that any fair-use defense was in fact precluded by Section 108. Intriguingly, the plaintiffs also argued that Section 121 of the Act which expressly authorized the reproduction of books for the blind was also preempted by section 108’s general provisions on library photocopying. Not surprisingly, the district court held that “the clear language that Section 108 provides rights to libraries in addition to fair-use rights that might be available.”

In their Appeal Brief to the Second Circuit the plaintiffs have focused on a different line of argument, they now contend that the district court erred in failing to consider the express limitations of section 108 in its evaluation of fair use. In other words, because “library copying – is specifically addressed by another statute, Section 108, which therefore should guide the fair use analysis” (Authors Guild Ap. Br. Page 30).

Cute. So every time Congress creates a public interest exception to copyright, that exception becomes a limitation on the balancing function of the fair use doctrine and thus, in effect, an expansion of the rights of copyright owners. I don’t buy it and I am pretty sure the court of appeals won’t either.

The argument can also be flipped the other way. Jonathan Band argues in a recent paper that

Courts should consider a defendant’s substantial compliance with a specific exception in Title 17 when applying the fair use privilege. As the court assesses the first fair use factor, the purpose and character of the use, the court should give great weight to the defendant’s substantial compliance with the exception. The court should recognize that Congress determined that uses of similar purpose and character did not constitute infringement.

See, Jonathan Band, The Impact of Substantial Compliance with Copyright Exceptions on Fair Use.

The phantom tollbooth — Are workable markets for library digitization licenses just around the corner?

Introduction and Necessary Disclaimer 

This one of a series of posts concerning the Authors Guild v. Hathitrust case, specifically these posts take the form of commentary on the Authors Guild Appeal Brief (February 25, 2013). The views expressed on this site are purely my own.

The phantom tollbooth — Are workable markets for library digitization licenses just around the corner?

What if? … One of the interesting questions arising out of the Authors Guild suit against the HathiTrust is, what happens if the Authors Guild wins? Is there any way that library digitization programs could continue if the court ruled that any kind of digitization for any reason at all required permission from the copyright in advance?

Naturally the plaintiffs and the defendants see this issue differently. The plaintiffs argue that copying without permission is wrong and that they will find a way to sell permission to people like the defendants so that digitization can continue. The defendants argue that copying without permission is permitted for specific reasons and that in this case there is no practical way to obtain the permissions the plaintiffs insist are required.

Whether a market for digitization licenses (either in general or for educational access to orphan works in particular) would work in practice goes to the heart of the fourth fair use factor under Section 107 of the Copyright Act.

The fourth fair use factor is “the effect of the use upon the potential market for or value of the copyrighted work.” The Authors Guild argues that “the District Court erred in failing to recognize the actual and potential harm to the Authors caused by the Libraries’ unlicensed mass digitization.” (Authors Guild Ap. Br. page 38). My guess is that the defendants will argue the exact opposite – that the district court was right to conclude that copyright owners were not likely to suffer any harm from the defendants library digitization project. The district court never got to the issue of the defendants’ orphan works project for reasons covered in previous posts, so the best the plaintiffs could hope for on that issue is a favorable remand from the Second Circuit.

Market effect and market definition

The question of market effect risks collapsing into tautology because every use by a defendant represents something that could, in theory, be licensed to the defendant if the court rules that it is not fair use. Courts try to avoid trap of circular reasoning by limiting market effect to effects that are (a) that are cognizable under copyright and (b) not too remote or speculative.

(a) Cognizable Injury

Not every perceived harm will count towards the court’s assessment of market effect in fair use cases. A few examples from the leading cases are illustrative:

In Campbell v. Acuff Rose the Supreme Court said:

[W]hen a lethal parody, like a scathing theater review, kills demand for the original, it does not produce a harm cognizable under the Copyright Act.  Because parody may quite legitimately aim at garroting the original, destroying it commercially as well as artistically, the role of the courts is to distinguish between biting criticism that merely suppresses demand and copyright infringement, which usurps it. [Campbell v. Acuff-Rose Music, 510 U.S. 569, 591–92 (1994) (internal quotation marks and citations omitted).]

To take an example closer to library digitization, consider the software reverse engineering cases. Courts have consistently held that making unauthorized copies of a computer program, as a necessary step in reverse engineering, is fair use. In Sony v. Connectix, the Ninth Circuit held that although the defendant’s Virtual Game Station console directly competed with Sony in the market for platforms capable of playing Sony Playstation games, the Virtual Game Station was a “legitimate competitor” in that market. The court concluded that Sony’s desire to control the market for gaming platforms was understandable but that

“copyright law . . . does not confer such a monopoly.” [Sony Computer Entm’t, Inc. v. Connectix Corp., 203 F.3d 596, 607 (9th Cir. 2000)]

In Bill Graham Archives v. Dorling Kindersley Ltd., 448 F.3d 605, 615 (2d Cir. 2006) the court of appeals held that the defendant’s use of the plaintiffs images was

“transformatively different from their original expressive purpose. In a case such as this, a copyright holder cannot prevent others from entering fair use markets merely by developing or licensing a market for parody, news reporting, educational or other transformative uses of its own creative work.” (citations and quotations omitted).

In the case the Fourth Circuit addressed claims of copyright infringement by students who objected to their papers being put through an automated plagiarism detection system.

“Clearly no market substitute was created by iParadigms, whose archived student works do not supplant the plaintiffs’ works in the “paper mill” market so much as merely suppress demand for them, by keeping record of the fact that such works had been previously submitted. … In our view, then, any harm here is not of the kind protected against by copyright law.” (AV ex rel. Vanderhye v. iparadigms, LLC, 562 F. 3d 630, 464 (4th Cir. 2009))

What is the lesson here?

Before deciding whether an alleged commercial injury has or is likely to occur, a court has to figure out whether it is the kind of injury that copyright is supposed to prevent. Copyright law is not meant to prevent the harm of critical review or mockery, nor is it meant to prevent competition based on interoperable technology, nor is it meant to prevent computerized analysis of text.

Authors have certain rights in relation to original expression – but they don’t have the right to prevent other people building meta data out of that expression. Section 102(b) of the Copyright Act makes it quite clear that

“In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.”

So, it is hard to see a copyright injury in this case if the only harm the plaintiffs can point to is that the defendants can now generate valuable data about library book (this is what you get of text-mining, computational analysis of text and search engines in general).

(b) Non-speculative Injury

Even if a court were to decide that in an ideal world digitization for non-expressive use should be an exclusive right of the copyright owner, that same court might still think that in the real world library digitization qualifies as a fair use because there is no functioning market for such rights.

Most of what I have said above applies to the non-expressive use aspects of library digitization and not to the important question of access to orphan works for expressive use. But on this issue of market effect, the two issues begin to merge again. [It is hard to say much about the market effect of any particular orphan works project without seeing the details, which is why the district court was almost certainly right that the issue was not ripe for adjudication.]

What do the plaintiffs need to show in terms of future harm?

The plaintiffs don’t need to show that there is, right here and now, a well function licensing system that would enable libraries to obtain permission to digitize on a book by book basis. According to the Supreme Court in Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417, 451 (1984)

“What is necessary is a showing by a preponderance of the evidence that some meaningful likelihood of future harm exists.”

If there is no market at present, plaintiffs must show that such a market is “likely to be developed”. (American Geophysical Union v. Texaco, 60 F.3d at 930 (2d Cir. 1994)]

In American Geophysical Union v. Texaco, the Second Circuit held that a corporate library’s systematic photocopying of scientific journal articles for research and archival purposes was not fair use, in part because the then recently created Copyright Clearance Center (“CCC”) constituted

“a workable market for institutional users to obtain licenses for the right to produce their own copies of individual articles via photocopying.” [60 F.3d at 915]

As I have already noted in a previous post, you can only take the photocopying cases so far in relation to library digitization. The court in Texaco was not dealing with an orphan works problem, nor was it dealing with a situation where literally millions of permissions would be required.

A deep dive into the “undisputed evidence”

“The undisputed evidence establishes that the Libraries’ activities will harm the Authors by undermining existing and emerging licensing opportunities.” (Authors Guild Ap. Br. Page 41)

What exactly is the Plaintiff’s evidence of “existing and emerging licensing opportunities”?

After reviewing the Texaco journal photocopying case, the plaintiffs’ Appeal Brief announces (at page 41) that

“Here, too, there are several existing and likely to be developed workable markets for the Libraries to obtain licenses to digitize, store and make various uses of the copyrighted books in their collections.” (Authors Guild Ap. Br. Page 41 (citing the Plaintiff’s Expert Report by Professor Gervais [Daniel Gervais in Authors Guild v HT(2012)])

At paragraph 33 of his report, Professor Gervais states:

“I believe that if the Defendants’ uses are not determined to be fair uses, the market will provide a collective licensing system for the types of uses that the Defendants have been making so that they would not have to negotiate a transactional license for each book or other work they wish to use.”

No one can doubt that Professor Gervais believes this, but the evidence is far from indisputable. In the American context, Gervais points to two sets of voluntary organizations where the copyright owner has to opt in to collective management – performing rights organizations like ASCAP, BMI and SESAC and also the Copyright Clearance Center. These are opt-in organizations and it is hard to see how anything like the ASCAP model would work to address the orphan works issue.

Gervais also points to several compulsory licensing regimes created by statute in the U.S. and overseas. Section 115 of the Copyright Act establishes a compulsory license for making and distributing phonorecords, sometimes called the compulsory cover license. This license came into being when Congress broadened the definition of copying under the 1909 Copyright Act. In addition, the Digital Performance Right in Sound Recordings Act set up several different compulsory licenses for the digital audio transmission of sound recordings in 1995. The compulsory cover license has worked fairly well for the past hundred years, the DPRSR has been an almost unmitigated disaster. [See Peter DiCola & Matthew Sag, An Information-Gathering Approach to Copyright Policy, Cardozo Law Review (2012).]

But that is really beside the point; neither of these examples shows that collective licensing schemes spontaneously emerge! All they show that collective licensing schemes can emerge when Congress deliberately creates them—in both cases as the quid pro quo for extending a new right to copyright holders.

Likewise, the fact that a handful of Nordic countries have legislated for extended collective licensing for mass-digitization or that serious British academics have recommended the same does not prove much either. It might suggest that the U.S. government should consider this option, but it in no way shows that market based solutions to the transaction costs problems associated with library digitization will spring forth without direct action by Congress.

My favorite part of the Authors Guild’s brief on this point is where it says, without any hint of irony, that

“The ASA [Amended Settlement Agreement] exemplifies how a mass digitization licensing system could operate …” (Authors Guild Ap. Br. page 43).

The ASA would have turned the Google Book search engine into the world’s largest book store – the orphan works problem was solved by making virtually everything for sale by default. The court rejected this settlement as going beyond the scope of a permissible class action, so it seems a bit perverse to argue that the defeat of the ASA offers any hope for a solution to the transaction costs problems inherent in digitizing old library books.

How would a market for library digitization licenses actually work?

Back to Professor Gervais’ belief that “if the Defendants’ uses are not determined to be fair uses, the market will provide a collective licensing system for the types of uses that the Defendants have been making so that they would not have to negotiate a transactional license for each book or other work they wish to use.” (Gervais para 33)

This might work in some limited fashion for new works, but there is nothing in the Gervais report that explains how the CCC or some equivalent organization would be able to deal with the unclear allocation of digital rights between authors and publishers or address the orphan works issues that plague older out of print works.

If library digitization for non-expressive uses or to meet the needs of the print-disabled requires a license, that is not a license most publishers will have the authority to give.  We know from the Georgia State Copyright case (Cambridge University Press et al. v. Becker et al. 863 F. Supp. 2d 1190 – Dist. Court, ND Georgia, 2012) that only a small percentage of the works that are available through the CCC’s academic permission system are currently licensed for electronic distribution. In general the figure is about 12%, in the GSU case specifically digital licenses were available in only 44 of the 75 claimed instances of infringement – those 75 claims were hand-picked by publishers as their best evidence of infringement.

In 2001 a federal court in the Southern District of New York ruled that a widely used publishing contract between authors and publishers which granted publisher the rights to publish a work “in book form” did not include electronic rights to the book. Random House, Inc. v. Rosetta Books LLC, 150 F. Supp. 2d 613 (S.D.N.Y. 2001). Publishers now write better contracts (better for them), but for all but relatively recent works, publishers do not typically own the rights to most books in digital formats – those rights typically belong to individual authors. It is curious that Professor Gervais’ expert report does not address this most salient of facts.

The Authors Guild’s Appeal Brief attempts to gloss over these fundamental problem by noting simply that:

“In the United States, organizations like the CCC presently license the same general type of copyrighted content as the material copied through the Google Library Project.” (Authors Guild Ap. Br. page 41)

The fact the CCC licenses some books (which is what they mean by “the same general type of copyrighted content) for some uses does nothing to show that the CCC could license the millions of books at issue here for the purpose of library digitization.

The Authors Guild also points to the fact that

“The CCC has existing licenses with academic institutions like the Libraries that allow for the rights of thousands of works to be collectively negotiated.” (Authors Guild Ap. Br. page 42).

“…there are existing mechanisms that would allow the Libraries to obtain licenses for thousands of works at once.’ (page 43).

This is hardly persuasive when the problem of library digitization relates to millions, not thousands.

Was there any other evidence to which the Authors Guild could have cited?

In one of their submissions to the district court the plaintiffs stated that “defendants also permit the Infringed Books to be used for non-consumptive research, an emerging field that represents another potential licensing stream for orphans.” (Memorandum Of Law In Support Of Plaintiffs’ Motion For Summary Judgment, June 29, 2012, Page 28).

The plaintiffs’ memorandum cites UF130 as evidence. This is a reference to the Declaration of T.J. Stiles (Case 1:11-cv-06351-HB Document 83, Filed June 29, 2012), one of the individual author plaintiffs and his associated deposition (Case 1:11-cv-06351-HB Document 114-3 Filed June 29, 2012). [Declaration of Stiles in Authors Guild v. Hathitrust]

I have extracted some of the highlights:

  • Declaration of T.J. Stiles, paragraph 13:

“From what I’ve learned about it, non-consumptive research represents a potentially exciting field for academics and therefore an emerging licensing opportunity for authors at a time when revenues are decreasing.”

  • Declaration of T.J. Stiles, paragraph 15:

 “Moreover, as a copyright owner, I (not Defendants) should be allowed to decide whether or not my works are copied and included in a database used for non-consumptive research, full text search indexing or other uses.” …

  •  Deposition of T.J. Stiles, Page 35 line 9 -20:

“Despite my own uncertainty about the actual means and uses of non-consumptive research, I must add that I think it’s a very exciting field, it’s
 very interesting and as an author, it sounds like a tremendous emerging market that I would love to license my book to be — my books to be taken advantage of in such a manner. And so as a potentially important emerging market, I certainly have no difficulties or problems with non-consumptive research. In fact I find that there may be great market potential in it. And so 
it interests me as a creator of works in terms of
potential licensing.”

  • Deposition of T.J. Stiles, Page 168 lines 11-19

“So non-consumptive research again, if I understand it correctly, allows scholars or others to conduct searches across large quantities of books that have been digitized and then to generate statistical or otherwise interesting analysis of multiple works in the same piece of research. And, again, I would be very interested in licensing my book for use in non-consumptive research. So I’m very excited to have this emerging market identified for me.”

  • Deposition of T.J. Stiles, Page 168 -169

 “Question. You said you’re interested in licensing your book for non-consumptive research; is that correct?

Answer. Absolutely.

Question. Why have you not done that so far?

Answer. Not all markets are as mature as others. And not every market is determined by the — in this case the owner of an intellectual property going out and seeking to create that market.

In non-consumptive research, my book, this 
individual title, would be by definition, you know, one 
small element. Non-consumptive research, to my understanding, requires large numbers of books. So
 for — it would not make any sense at all for me to take
a backlist title of mine and try to go out and create a
market. But that fact that it makes no sense for me to 
spend a lot of time trying to create this market does not mean that I in any way give up my rights to a
 potential market as it emerges, to take part as a 
licensor of rights. I think that it’s very interesting 
and potentially very powerful. But it’s certainly not 
incumbent upon me to create that market in order to preserve my rights within it.”

Wishful thinking is a far cry from evidence.

Props from a Potemkin village

It seems to me, the plaintiffs in Authors Guild v. HathiTrust are asking the court to create a new and unprecedented right—to give copyright holders the right to control intermediate non-expressive uses—and to simply assume that an efficient market to allocate these new rights will spontaneously appear. It would be just as wrong to allow copyright holders to license non-expressive uses of copyrighted works as it would be to allow them to license quotation for the purpose of criticism and transformative copying for the purpose of parody.

You can’t just wish a new legal right into existence by establishing a set of tollbooths to collect on it. And in this case the tollbooths are just props from a Potemkin village. The ‘injury’ the plaintiffs assert strikes me as both speculative and beyond the scope of their statutory entitlements.

The inherent dangers of digitized collections have been massively over-hyped

Introduction and Necessary Disclaimer 

This one of a series of posts concerning the Authors Guild v. Hathitrust case, specifically these posts take the form of commentary on the Authors Guild Appeal Brief (February 25, 2013). The views expressed on this site are purely my own.

Today’s topic …

Digital Copies Are Dangerous!

They must be, the Authors Guild keeps saying it.

“ … the Libraries expose the Authors’ property to immense security risks by digitizing, copying, transferring, storing and allowing various levels of online access to millions of copyright-protected books. … A breach has the potential to cannibalize the book market through the same type of widespread Internet piracy that decimated the music industry. … Although a breach has yet to occur, the Libraries are playing with fire.” (Authors Guild Appeal Brief, Page 40)

Whose property?

The first thing to notice about the above quote is that the grounding assertion about the “authors’ property” is a little off base. The actual physical books were purchased by the defendants at a cost of hundreds of millions of dollars. It is convenient for the Authors Guild to overlook this fact, but in the most literal (but not the literary) sense, the books are the property of the libraries.

Assuming, as I believe and the district court held, the defendants are entitled to scan their paper collections to enable disabled access, text-mining, computational analysis, and full-text searching, then there is no sense in which the copies so made are the property of the copyright owners. Of course, other uses of the digitized corpus might be infringing, but no more so than placing a book on a library shelf exposes it to certain acts of piracy.

Tension, if not blatant contradiction

The ‘digital is dangerous’ argument exposes a tension in the Authors Guild’s legal argument. The Guild argues that plaintiff approved digitization will indeed take place following the spontaneous appearance of collective licensing organizations to manage these new rights (more on this in a future post); and yet, the Guild also argues that digitization will lead inexorably to massive piracy, which suggests that they would not approve digitization under any terms. Can these both be true?

The inherent dangers of digitized collections have been massively over-hyped

The inherent dangers of digitized collections have been massively over-hyped. (See, I said it twice, so it must be true.) Yes, as e-book readers become more and more widely used, there is a risk that pirate copies will substitute for legitimate sales. But this risk is entirely independent of library digitization. It is not just that it is usually bad policy to restrict A’s freedom because based on the hypothetical illegal acts by B later down the track. The real issue is that library digitization is totally irrelevant to e-book piracy – works of commercial value for which sales could be lost are already available in digital form on scofflaw file sharing websites. Run a search for “bittorrent harry potter book” on Google and you will see what I mean.

Prohibiting universities from digitizing library books to advance the state of human knowledge will do nothing to stem the problem of online piracy, except perhaps giving people one more reason to reject the legitimacy of copyright law altogether.

The Authors Guild argues even if the defendants’ security practices are in fact adequate, that a fair use ruling in favor of the universities would “encourage far less sophisticated providers to digitize, copy, store and make similar uses of books.” (Authors Guild Ap. Br. Page 40) This appears to fall well short of establishing the likelihood of any harm cognizable under copyright law. Mass digitization is a resource intensive undertaking, the public and private institutions that undertake it will no doubt be mindful of their tort liability for any failure to take standard precautions.

The Multiple Copy Argument – Some thoughts on #fairuse and Authors Guild v. Hathitrust (pt4)

Introduction and Necessary Disclaimer 

This one of a series of posts concerning the Authors Guild v. Hathitrust case, specifically these posts take the form of commentary on the Authors Guild Appeal Brief (February 25, 2013). The views expressed on this site are purely my own.

Today’s topic …

The Multiple Copy Argument

The Authors Guild Appeal Brief contains an interesting argument that is hard to summarize with perfect fidelity because it appears in so many places throughout the document (illustrations to follow). Essentially the plaintiffs now appear to argue that even if some copying would be allowed for certain library digitization purposes, the defendants created a too many copies and that these copies, or their retention, exceed the parameters of any fair use claim.

Examples from the Authors Guild Appeal Brief

The multiple copy argument first appears in the plaintiffs’ Statement of issues presented

“3. Did the District Court err by failing to recognize that the Libraries’ online storage of multiple copies of the unauthorized digital library goes far beyond what is necessary to accomplish any transformative purpose of the MDP?” (Authors Guild Ap. Br. page 4)

However it also appears on pages 8, 9-10, 12, 18, 30, 31, 32, 33, 36, 37 and 38.

“Each digital replica would include a set of image files representing every page of the work and a text file of the book’s words generated through an optical character recognition process.” (Authors Guild Ap. Br. page 8)

“the Libraries receive their own digital copies of the works to store and use.”(Authors Guild Ap. Br. page 8)

“In addition to the copies retained by Google, four digital copies of each book are maintained in the HDL, with two such copies stored on servers located in Michigan and Indiana and two additional copies stored on backup tapes.” (Authors Guild Ap. Br. 9-10)

“Moreover, even if certain of the Libraries’ uses are deemed transformative, their online storage of multiple digital duplicates of the books goes far beyond what is necessary to fulfill that purpose.” (Authors Guild Ap. Br. 12)

“[I]n analyzing whether the Mass Digitization Program is fair use under Section 107, the District Court failed to consider whether the Libraries could have made the uses the court found to be transformative – facilitating search and access for the print-disabled – without keeping multiple copies of the Authors’ works online and subjecting them to unauthorized access and widespread distribution.” (Authors Guild Ap. Br. 18)

“Moreover, to the extent that there is any transformative or other legitimate purpose to the Libraries’ actions, the making of multiple copies of the works and then storing the full text and image files online where they are susceptible to theft and widespread distribution goes far beyond what is needed to satisfy such purpose.” (Authors Guild Ap. Br. 30)

“the District Court erred by failing to recognize that the Libraries are able to facilitate text searching and to provide access to the print-disabled without creating and storing so many digital copies online.” (Authors Guild Ap. Br. page 31)

“(ii) Even if Copying Millions of Books to Facilitate Search is Transformative, There is No Justification for Storing Multiple Copies of the Image and Text Files Online” …”The Authors maintain, as they did below, that the Libraries have no right to copy and use millions of books without authorization or payment. If the Libraries want to scan print books in order to create indices or to facilitate text mining or other research tools, they should be required to ask for and obtain permission for their copying. But more importantly for purposes of this appeal, to the extent that any of the Libraries’ goals fit within the rubric of fair use, the Libraries should be permitted to do no more than is necessary to accomplish that particular purpose.” (Authors Guild Ap. Br. 32)

“Moreover, unlike HathiTrust’s perpetual storage of high resolution image files and text files of every book, the Web pages copied by a search engine are incidental to the search function.” (Authors Guild Ap. Br. page 33)

“[O]nce a book’s text is recorded in the index, the image and text files are no longer necessary for the operation of the search engine.” (Authors Guild Ap. Br. page 37)

“[E]ven if it is necessary to digitize an entire work in order to index the contents for facilitating search, the third factor weighs heavily against the Libraries because they are unnecessarily retaining complete image and text files comprising every page of every book.” (Authors Guild Ap. Br. page 36)

Some thoughts on the Multiple Copy Argument

It entirely plausible that a plaintiff might look at a defendant who has made lots and lots of copies and argue that the very multiplicity of the copying is evidence that the real purpose was not the transformative use claimed, but some other use. For example, if Borders (1971-2011) had scanned its whole inventory and made 60,000 copies of the collection in dvd bundles, we might have begun to suspect they were planning on selling them.

However, in the context of the library digitization being litigated in Authors Guild v. HathiTrust, there is no similar mystery about the extent of copying. The libraries maintain the original scan images because those images are needed to quality-check the OCR (optical character recognition) text versions. Those versions are also needed so that the collection can be re-digitizes when, inevitably, someone invents a smarter OCR program that is less prone to error. A biologist would not throw out an original specimen after taking their initial notes; a social scientist would not delete her original data after running her initial set of regressions. It would be somewhere between reckless and crazy to throw out the original scans.

The same applies to the OCR-text files. It might be true that once you create a search index you don’t need the original text files to actually implement search. But as anyone with any experience in software development or working with data will tell you, there are always new and better ways to process information. It would be hubris, almost a crime against knowledge, to pretend that search indexing or optical character recognition in 2013 are a good as they will ever be.

The Authors Guild Appeal Brief appears (to me) to be deliberately obtuse when it says “… even if this Court were to hold that HathiTrust in its current configuration satisfies these criteria, the Libraries still have not demonstrated their need to retain the digital image files in order to facilitate access to the print-disabled, as the assistive technology uses text files to convert the text from the book into speech.” (Authors Guild Ap. Br. page 38). Does the Authors Guild seriously intend that the print-disabled should be held hostage to state of the art in OCR and text-to-speech as of 2013?

Any library digitization exercise should generate a handful of copies per book – you have to keep the original image and OCR files safe; you have to duplicate them so people can examine them; you have to store everything in multiple locations in case of flood, fire, terrorist attack or simple human error, and if scientists are regularly testing new equations against the original data you might need to mirror some of that data to increase the speed of the network. There is no reason why the universities should treat these digitized files any more cavalierly than Facebook treats the 267 photos of my dog I have posted to the social network.