digital humanities – Page 2

May 22, 2013May 27, 2013

Please Support the Digital Humanities Amicus Brief in AuthorsGuild v. Hathitrust

Call for Support

We are seeking your support for our amicus brief in the Court of Appeals in Authors Guild v. Hathitrust. We believe that this case will have a dramatic effect on research in computer science to linguistics, history, literature and the digital humanities.

Background

In 2005, the Authors Guild, a lobby group with about 8,500 members including published authors, literary agents and lawyers, filed a class-action lawsuit claiming that Google’s library digitization project was a “massive copyright infringement”. A settlement was proposed in that case in 2008, modified after strenuous objections from academics, other author groups and several foreign governments in 2009 and rejected by the court in 2011.In September 2011, in a separate case, the Authors Guild sued several universities and the HathiTrust for participating in Google’s book-scanning project. On July 7, 2012 the Association for Computers and the Humanities and more than 60 scholars from disciplines ranging from law and computer science to linguistics, history and literature, filed an amicus curiae brief in Authors Guild v HathiTrust on behalf of the digital humanities.

District Court Decision

On October 10, 2012, Judge Baer (Southern District of New York) ruled against the Authors Guild and their fellow plaintiffs and held that the library digitization for uses such as text-mining are “transformative” as that term of art is used in copyright law and, on balance, fair use (i.e., not copyright infringement). Judge Baer’s opinion cites our amicus brief, adopts one of our examples and appears to follow the basic structure of our legal argument.

Appeal

The Authors Guild is now appealing Judge Baer’s decision (on this and other grounds) and we would like your support in drafting a new brief for the U.S. Court of Appeals for the Second Circuit.

Argument in a nutshell

According to the U.S. Constitution, the purpose of copyright is “To promote the Progress of Science and useful Arts”. Copyright law should not be an obstacle to statistical and computational analysis of the millions of books owned by university libraries. Copyright law has long recognized the distinction between protecting an author’s original expression and the public’s right to access the facts and ideas contained within that expression. That distinction must be maintained in the digital age so that library digitization, internet search and related non-expressive uses of written works remain legal.

Draft available on request

Email matthewsag@gmail.com for a full draft, or download our previous effort (in related district court litigation) here. The final brief will be very similar to this.

How you can help preserve the balance of copyright law

(1) You can let us know that you would like to join our brief (we need your name and affiliation e.g. Associate Professor, Jane Doe, Springfield University). We would also like to add a one line description of any aspect of your work that is relevant to the brief, e.g. ___ Grant to study ___ in ____ literary corpus or a relevant publication.

Please note that ours is not the only amicus brief being filed in this case. Jennifer Urban (U.C. Berkeley) will also be filing a brief on arguing that the plaintiffs do not represent the interests of academic authors who comprise a large proportion of the class. YOU CAN’T SIGN BOTH. Please consider endorsing whichever brief speaks most closely to your concerns as an academic.

We need your name etc., by June 3, 2013. Please email matthewsag@gmail.com or enter your details directly via this online tool.

(2) You can point us toward easy to understand and compelling examples of the kind of research enabled by mass-digitization (we can’t include all your wonderful work, but we would like to understand it better).

(3) You can send this link to other academics and Phd students.

Thank you!

Matthew Sag, Matthew Jockers and Jason Schultz

May 21, 2013May 21, 2013

Fair use and (non-)compliance with other statutory exceptions (Authors Guild v. Hathitrust)

Introduction and Necessary Disclaimer

This the last of a series of posts commenting on the Authors Guild Appeal Brief (February 25, 2013) in Authors Guild v. Hathitrust. The views expressed on this site are purely my own.

Has the Authors Guild Discovered a new fair use factor?

The plaintiffs in Authors Guild v. HathiTrust had a lot to say about Section 108 of the Copyright in when this case was in the district court. Section 108 gives libraries the right to make a limited number of copies of certain works for specified purposes. Section 108(f)(4)) explicitly states that “[n]othing in this section. . . in any way affects the right of fair use as provided by section 107”, nonetheless the plaintiffs argued that any fair-use defense was in fact precluded by Section 108. Intriguingly, the plaintiffs also argued that Section 121 of the Act which expressly authorized the reproduction of books for the blind was also preempted by section 108’s general provisions on library photocopying. Not surprisingly, the district court held that “the clear language that Section 108 provides rights to libraries in addition to fair-use rights that might be available.”

In their Appeal Brief to the Second Circuit the plaintiffs have focused on a different line of argument, they now contend that the district court erred in failing to consider the express limitations of section 108 in its evaluation of fair use. In other words, because “library copying – is specifically addressed by another statute, Section 108, which therefore should guide the fair use analysis” (Authors Guild Ap. Br. Page 30).

Cute. So every time Congress creates a public interest exception to copyright, that exception becomes a limitation on the balancing function of the fair use doctrine and thus, in effect, an expansion of the rights of copyright owners. I don’t buy it and I am pretty sure the court of appeals won’t either.

The argument can also be flipped the other way. Jonathan Band argues in a recent paper that

Courts should consider a defendant’s substantial compliance with a specific exception in Title 17 when applying the fair use privilege. As the court assesses the first fair use factor, the purpose and character of the use, the court should give great weight to the defendant’s substantial compliance with the exception. The court should recognize that Congress determined that uses of similar purpose and character did not constitute infringement.

See, Jonathan Band, The Impact of Substantial Compliance with Copyright Exceptions on Fair Use.

May 10, 2013

The phantom tollbooth — Are workable markets for library digitization licenses just around the corner?

Introduction and Necessary Disclaimer

The phantom tollbooth — Are workable markets for library digitization licenses just around the corner?

What if? … One of the interesting questions arising out of the Authors Guild suit against the HathiTrust is, what happens if the Authors Guild wins? Is there any way that library digitization programs could continue if the court ruled that any kind of digitization for any reason at all required permission from the copyright in advance?

Naturally the plaintiffs and the defendants see this issue differently. The plaintiffs argue that copying without permission is wrong and that they will find a way to sell permission to people like the defendants so that digitization can continue. The defendants argue that copying without permission is permitted for specific reasons and that in this case there is no practical way to obtain the permissions the plaintiffs insist are required.

Whether a market for digitization licenses (either in general or for educational access to orphan works in particular) would work in practice goes to the heart of the fourth fair use factor under Section 107 of the Copyright Act.

The fourth fair use factor is “the effect of the use upon the potential market for or value of the copyrighted work.” The Authors Guild argues that “the District Court erred in failing to recognize the actual and potential harm to the Authors caused by the Libraries’ unlicensed mass digitization.” (Authors Guild Ap. Br. page 38). My guess is that the defendants will argue the exact opposite – that the district court was right to conclude that copyright owners were not likely to suffer any harm from the defendants library digitization project. The district court never got to the issue of the defendants’ orphan works project for reasons covered in previous posts, so the best the plaintiffs could hope for on that issue is a favorable remand from the Second Circuit.

Market effect and market definition

The question of market effect risks collapsing into tautology because every use by a defendant represents something that could, in theory, be licensed to the defendant if the court rules that it is not fair use. Courts try to avoid trap of circular reasoning by limiting market effect to effects that are (a) that are cognizable under copyright and (b) not too remote or speculative.

(a) Cognizable Injury

Not every perceived harm will count towards the court’s assessment of market effect in fair use cases. A few examples from the leading cases are illustrative:

In Campbell v. Acuff Rose the Supreme Court said:

[W]hen a lethal parody, like a scathing theater review, kills demand for the original, it does not produce a harm cognizable under the Copyright Act. Because parody may quite legitimately aim at garroting the original, destroying it commercially as well as artistically, the role of the courts is to distinguish between biting criticism that merely suppresses demand and copyright infringement, which usurps it. [Campbell v. Acuff-Rose Music, 510 U.S. 569, 591–92 (1994) (internal quotation marks and citations omitted).]

To take an example closer to library digitization, consider the software reverse engineering cases. Courts have consistently held that making unauthorized copies of a computer program, as a necessary step in reverse engineering, is fair use. In Sony v. Connectix, the Ninth Circuit held that although the defendant’s Virtual Game Station console directly competed with Sony in the market for platforms capable of playing Sony Playstation games, the Virtual Game Station was a “legitimate competitor” in that market. The court concluded that Sony’s desire to control the market for gaming platforms was understandable but that

“copyright law . . . does not confer such a monopoly.” [Sony Computer Entm’t, Inc. v. Connectix Corp., 203 F.3d 596, 607 (9th Cir. 2000)]

In Bill Graham Archives v. Dorling Kindersley Ltd., 448 F.3d 605, 615 (2d Cir. 2006) the court of appeals held that the defendant’s use of the plaintiffs images was

“transformatively different from their original expressive purpose. In a case such as this, a copyright holder cannot prevent others from entering fair use markets merely by developing or licensing a market for parody, news reporting, educational or other transformative uses of its own creative work.” (citations and quotations omitted).

In the Turnitin.com case the Fourth Circuit addressed claims of copyright infringement by students who objected to their papers being put through an automated plagiarism detection system.

“Clearly no market substitute was created by iParadigms, whose archived student works do not supplant the plaintiffs’ works in the “paper mill” market so much as merely suppress demand for them, by keeping record of the fact that such works had been previously submitted. … In our view, then, any harm here is not of the kind protected against by copyright law.” (AV ex rel. Vanderhye v. iparadigms, LLC, 562 F. 3d 630, 464 (4th Cir. 2009))

What is the lesson here?

Before deciding whether an alleged commercial injury has or is likely to occur, a court has to figure out whether it is the kind of injury that copyright is supposed to prevent. Copyright law is not meant to prevent the harm of critical review or mockery, nor is it meant to prevent competition based on interoperable technology, nor is it meant to prevent computerized analysis of text.

Authors have certain rights in relation to original expression – but they don’t have the right to prevent other people building meta data out of that expression. Section 102(b) of the Copyright Act makes it quite clear that

“In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.”

So, it is hard to see a copyright injury in this case if the only harm the plaintiffs can point to is that the defendants can now generate valuable data about library book (this is what you get of text-mining, computational analysis of text and search engines in general).

(b) Non-speculative Injury

Even if a court were to decide that in an ideal world digitization for non-expressive use should be an exclusive right of the copyright owner, that same court might still think that in the real world library digitization qualifies as a fair use because there is no functioning market for such rights.

Most of what I have said above applies to the non-expressive use aspects of library digitization and not to the important question of access to orphan works for expressive use. But on this issue of market effect, the two issues begin to merge again. [It is hard to say much about the market effect of any particular orphan works project without seeing the details, which is why the district court was almost certainly right that the issue was not ripe for adjudication.]

What do the plaintiffs need to show in terms of future harm?

The plaintiffs don’t need to show that there is, right here and now, a well function licensing system that would enable libraries to obtain permission to digitize on a book by book basis. According to the Supreme Court in Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417, 451 (1984)

“What is necessary is a showing by a preponderance of the evidence that some meaningful likelihood of future harm exists.”

If there is no market at present, plaintiffs must show that such a market is “likely to be developed”. (American Geophysical Union v. Texaco, 60 F.3d at 930 (2d Cir. 1994)]

In American Geophysical Union v. Texaco, the Second Circuit held that a corporate library’s systematic photocopying of scientific journal articles for research and archival purposes was not fair use, in part because the then recently created Copyright Clearance Center (“CCC”) constituted

“a workable market for institutional users to obtain licenses for the right to produce their own copies of individual articles via photocopying.” [60 F.3d at 915]

As I have already noted in a previous post, you can only take the photocopying cases so far in relation to library digitization. The court in Texaco was not dealing with an orphan works problem, nor was it dealing with a situation where literally millions of permissions would be required.

A deep dive into the “undisputed evidence”

“The undisputed evidence establishes that the Libraries’ activities will harm the Authors by undermining existing and emerging licensing opportunities.” (Authors Guild Ap. Br. Page 41)

What exactly is the Plaintiff’s evidence of “existing and emerging licensing opportunities”?

After reviewing the Texaco journal photocopying case, the plaintiffs’ Appeal Brief announces (at page 41) that

“Here, too, there are several existing and likely to be developed workable markets for the Libraries to obtain licenses to digitize, store and make various uses of the copyrighted books in their collections.” (Authors Guild Ap. Br. Page 41 (citing the Plaintiff’s Expert Report by Professor Gervais [Daniel Gervais in Authors Guild v HT(2012)])

At paragraph 33 of his report, Professor Gervais states:

“I believe that if the Defendants’ uses are not determined to be fair uses, the market will provide a collective licensing system for the types of uses that the Defendants have been making so that they would not have to negotiate a transactional license for each book or other work they wish to use.”

No one can doubt that Professor Gervais believes this, but the evidence is far from indisputable. In the American context, Gervais points to two sets of voluntary organizations where the copyright owner has to opt in to collective management – performing rights organizations like ASCAP, BMI and SESAC and also the Copyright Clearance Center. These are opt-in organizations and it is hard to see how anything like the ASCAP model would work to address the orphan works issue.

Gervais also points to several compulsory licensing regimes created by statute in the U.S. and overseas. Section 115 of the Copyright Act establishes a compulsory license for making and distributing phonorecords, sometimes called the compulsory cover license. This license came into being when Congress broadened the definition of copying under the 1909 Copyright Act. In addition, the Digital Performance Right in Sound Recordings Act set up several different compulsory licenses for the digital audio transmission of sound recordings in 1995. The compulsory cover license has worked fairly well for the past hundred years, the DPRSR has been an almost unmitigated disaster. [See Peter DiCola & Matthew Sag, An Information-Gathering Approach to Copyright Policy, Cardozo Law Review (2012).]

But that is really beside the point; neither of these examples shows that collective licensing schemes spontaneously emerge! All they show that collective licensing schemes can emerge when Congress deliberately creates them—in both cases as the quid pro quo for extending a new right to copyright holders.

Likewise, the fact that a handful of Nordic countries have legislated for extended collective licensing for mass-digitization or that serious British academics have recommended the same does not prove much either. It might suggest that the U.S. government should consider this option, but it in no way shows that market based solutions to the transaction costs problems associated with library digitization will spring forth without direct action by Congress.

My favorite part of the Authors Guild’s brief on this point is where it says, without any hint of irony, that

“The ASA [Amended Settlement Agreement] exemplifies how a mass digitization licensing system could operate …” (Authors Guild Ap. Br. page 43).

The ASA would have turned the Google Book search engine into the world’s largest book store – the orphan works problem was solved by making virtually everything for sale by default. The court rejected this settlement as going beyond the scope of a permissible class action, so it seems a bit perverse to argue that the defeat of the ASA offers any hope for a solution to the transaction costs problems inherent in digitizing old library books.

How would a market for library digitization licenses actually work?

Back to Professor Gervais’ belief that “if the Defendants’ uses are not determined to be fair uses, the market will provide a collective licensing system for the types of uses that the Defendants have been making so that they would not have to negotiate a transactional license for each book or other work they wish to use.” (Gervais para 33)

This might work in some limited fashion for new works, but there is nothing in the Gervais report that explains how the CCC or some equivalent organization would be able to deal with the unclear allocation of digital rights between authors and publishers or address the orphan works issues that plague older out of print works.

If library digitization for non-expressive uses or to meet the needs of the print-disabled requires a license, that is not a license most publishers will have the authority to give. We know from the Georgia State Copyright case (Cambridge University Press et al. v. Becker et al. 863 F. Supp. 2d 1190 – Dist. Court, ND Georgia, 2012) that only a small percentage of the works that are available through the CCC’s academic permission system are currently licensed for electronic distribution. In general the figure is about 12%, in the GSU case specifically digital licenses were available in only 44 of the 75 claimed instances of infringement – those 75 claims were hand-picked by publishers as their best evidence of infringement.

In 2001 a federal court in the Southern District of New York ruled that a widely used publishing contract between authors and publishers which granted publisher the rights to publish a work “in book form” did not include electronic rights to the book. Random House, Inc. v. Rosetta Books LLC, 150 F. Supp. 2d 613 (S.D.N.Y. 2001). Publishers now write better contracts (better for them), but for all but relatively recent works, publishers do not typically own the rights to most books in digital formats – those rights typically belong to individual authors. It is curious that Professor Gervais’ expert report does not address this most salient of facts.

The Authors Guild’s Appeal Brief attempts to gloss over these fundamental problem by noting simply that:

“In the United States, organizations like the CCC presently license the same general type of copyrighted content as the material copied through the Google Library Project.” (Authors Guild Ap. Br. page 41)

The fact the CCC licenses some books (which is what they mean by “the same general type of copyrighted content) for some uses does nothing to show that the CCC could license the millions of books at issue here for the purpose of library digitization.

The Authors Guild also points to the fact that

“The CCC has existing licenses with academic institutions like the Libraries that allow for the rights of thousands of works to be collectively negotiated.” (Authors Guild Ap. Br. page 42).

“…there are existing mechanisms that would allow the Libraries to obtain licenses for thousands of works at once.’ (page 43).

This is hardly persuasive when the problem of library digitization relates to millions, not thousands.

Was there any other evidence to which the Authors Guild could have cited?

In one of their submissions to the district court the plaintiffs stated that “defendants also permit the Infringed Books to be used for non-consumptive research, an emerging field that represents another potential licensing stream for orphans.” (Memorandum Of Law In Support Of Plaintiffs’ Motion For Summary Judgment, June 29, 2012, Page 28).

The plaintiffs’ memorandum cites UF130 as evidence. This is a reference to the Declaration of T.J. Stiles (Case 1:11-cv-06351-HB Document 83, Filed June 29, 2012), one of the individual author plaintiffs and his associated deposition (Case 1:11-cv-06351-HB Document 114-3 Filed June 29, 2012). [Declaration of Stiles in Authors Guild v. Hathitrust]

I have extracted some of the highlights:

Declaration of T.J. Stiles, paragraph 13:

“From what I’ve learned about it, non-consumptive research represents a potentially exciting field for academics and therefore an emerging licensing opportunity for authors at a time when revenues are decreasing.”

Declaration of T.J. Stiles, paragraph 15:

“Moreover, as a copyright owner, I (not Defendants) should be allowed to decide whether or not my works are copied and included in a database used for non-consumptive research, full text search indexing or other uses.” …

Deposition of T.J. Stiles, Page 35 line 9 -20:

“Despite my own uncertainty about the actual means and uses of non-consumptive research, I must add that I think it’s a very exciting field, it’s  very interesting and as an author, it sounds like a tremendous emerging market that I would love to license my book to be — my books to be taken advantage of in such a manner. And so as a potentially important emerging market, I certainly have no difficulties or problems with non-consumptive research. In fact I find that there may be great market potential in it. And so  it interests me as a creator of works in terms of potential licensing.”

Deposition of T.J. Stiles, Page 168 lines 11-19

“So non-consumptive research again, if I understand it correctly, allows scholars or others to conduct searches across large quantities of books that have been digitized and then to generate statistical or otherwise interesting analysis of multiple works in the same piece of research. And, again, I would be very interested in licensing my book for use in non-consumptive research. So I’m very excited to have this emerging market identified for me.”

Deposition of T.J. Stiles, Page 168 -169

“Question. You said you’re interested in licensing your book for non-consumptive research; is that correct?

Answer. Absolutely.

Question. Why have you not done that so far?

Answer. Not all markets are as mature as others. And not every market is determined by the — in this case the owner of an intellectual property going out and seeking to create that market.

In non-consumptive research, my book, this  individual title, would be by definition, you know, one  small element. Non-consumptive research, to my understanding, requires large numbers of books. So  for — it would not make any sense at all for me to take a backlist title of mine and try to go out and create a market. But that fact that it makes no sense for me to  spend a lot of time trying to create this market does not mean that I in any way give up my rights to a  potential market as it emerges, to take part as a  licensor of rights. I think that it’s very interesting  and potentially very powerful. But it’s certainly not  incumbent upon me to create that market in order to preserve my rights within it.”

Wishful thinking is a far cry from evidence.

Props from a Potemkin village

It seems to me, the plaintiffs in Authors Guild v. HathiTrust are asking the court to create a new and unprecedented right—to give copyright holders the right to control intermediate non-expressive uses—and to simply assume that an efficient market to allocate these new rights will spontaneously appear. It would be just as wrong to allow copyright holders to license non-expressive uses of copyrighted works as it would be to allow them to license quotation for the purpose of criticism and transformative copying for the purpose of parody.

You can’t just wish a new legal right into existence by establishing a set of tollbooths to collect on it. And in this case the tollbooths are just props from a Potemkin village. The ‘injury’ the plaintiffs assert strikes me as both speculative and beyond the scope of their statutory entitlements.

May 10, 2013

The inherent dangers of digitized collections have been massively over-hyped

Introduction and Necessary Disclaimer

Today’s topic …

Digital Copies Are Dangerous!

They must be, the Authors Guild keeps saying it.

“ … the Libraries expose the Authors’ property to immense security risks by digitizing, copying, transferring, storing and allowing various levels of online access to millions of copyright-protected books. … A breach has the potential to cannibalize the book market through the same type of widespread Internet piracy that decimated the music industry. … Although a breach has yet to occur, the Libraries are playing with fire.” (Authors Guild Appeal Brief, Page 40)

Whose property?

The first thing to notice about the above quote is that the grounding assertion about the “authors’ property” is a little off base. The actual physical books were purchased by the defendants at a cost of hundreds of millions of dollars. It is convenient for the Authors Guild to overlook this fact, but in the most literal (but not the literary) sense, the books are the property of the libraries.

Assuming, as I believe and the district court held, the defendants are entitled to scan their paper collections to enable disabled access, text-mining, computational analysis, and full-text searching, then there is no sense in which the copies so made are the property of the copyright owners. Of course, other uses of the digitized corpus might be infringing, but no more so than placing a book on a library shelf exposes it to certain acts of piracy.

Tension, if not blatant contradiction

The ‘digital is dangerous’ argument exposes a tension in the Authors Guild’s legal argument. The Guild argues that plaintiff approved digitization will indeed take place following the spontaneous appearance of collective licensing organizations to manage these new rights (more on this in a future post); and yet, the Guild also argues that digitization will lead inexorably to massive piracy, which suggests that they would not approve digitization under any terms. Can these both be true?

The inherent dangers of digitized collections have been massively over-hyped

The inherent dangers of digitized collections have been massively over-hyped. (See, I said it twice, so it must be true.) Yes, as e-book readers become more and more widely used, there is a risk that pirate copies will substitute for legitimate sales. But this risk is entirely independent of library digitization. It is not just that it is usually bad policy to restrict A’s freedom because based on the hypothetical illegal acts by B later down the track. The real issue is that library digitization is totally irrelevant to e-book piracy – works of commercial value for which sales could be lost are already available in digital form on scofflaw file sharing websites. Run a search for “bittorrent harry potter book” on Google and you will see what I mean.

Prohibiting universities from digitizing library books to advance the state of human knowledge will do nothing to stem the problem of online piracy, except perhaps giving people one more reason to reject the legitimacy of copyright law altogether.

The Authors Guild argues even if the defendants’ security practices are in fact adequate, that a fair use ruling in favor of the universities would “encourage far less sophisticated providers to digitize, copy, store and make similar uses of books.” (Authors Guild Ap. Br. Page 40) This appears to fall well short of establishing the likelihood of any harm cognizable under copyright law. Mass digitization is a resource intensive undertaking, the public and private institutions that undertake it will no doubt be mindful of their tort liability for any failure to take standard precautions.

May 7, 2013

The Multiple Copy Argument – Some thoughts on #fairuse and Authors Guild v. Hathitrust (pt4)

Introduction and Necessary Disclaimer

Today’s topic …

The Multiple Copy Argument

The Authors Guild Appeal Brief contains an interesting argument that is hard to summarize with perfect fidelity because it appears in so many places throughout the document (illustrations to follow). Essentially the plaintiffs now appear to argue that even if some copying would be allowed for certain library digitization purposes, the defendants created a too many copies and that these copies, or their retention, exceed the parameters of any fair use claim.

Examples from the Authors Guild Appeal Brief

The multiple copy argument first appears in the plaintiffs’ Statement of issues presented

“3. Did the District Court err by failing to recognize that the Libraries’ online storage of multiple copies of the unauthorized digital library goes far beyond what is necessary to accomplish any transformative purpose of the MDP?” (Authors Guild Ap. Br. page 4)

However it also appears on pages 8, 9-10, 12, 18, 30, 31, 32, 33, 36, 37 and 38.

“Each digital replica would include a set of image files representing every page of the work and a text file of the book’s words generated through an optical character recognition process.” (Authors Guild Ap. Br. page 8)

“the Libraries receive their own digital copies of the works to store and use.”(Authors Guild Ap. Br. page 8)

“In addition to the copies retained by Google, four digital copies of each book are maintained in the HDL, with two such copies stored on servers located in Michigan and Indiana and two additional copies stored on backup tapes.” (Authors Guild Ap. Br. 9-10)

“Moreover, even if certain of the Libraries’ uses are deemed transformative, their online storage of multiple digital duplicates of the books goes far beyond what is necessary to fulfill that purpose.” (Authors Guild Ap. Br. 12)

“[I]n analyzing whether the Mass Digitization Program is fair use under Section 107, the District Court failed to consider whether the Libraries could have made the uses the court found to be transformative – facilitating search and access for the print-disabled – without keeping multiple copies of the Authors’ works online and subjecting them to unauthorized access and widespread distribution.” (Authors Guild Ap. Br. 18)

“Moreover, to the extent that there is any transformative or other legitimate purpose to the Libraries’ actions, the making of multiple copies of the works and then storing the full text and image files online where they are susceptible to theft and widespread distribution goes far beyond what is needed to satisfy such purpose.” (Authors Guild Ap. Br. 30)

“the District Court erred by failing to recognize that the Libraries are able to facilitate text searching and to provide access to the print-disabled without creating and storing so many digital copies online.” (Authors Guild Ap. Br. page 31)

“(ii) Even if Copying Millions of Books to Facilitate Search is Transformative, There is No Justification for Storing Multiple Copies of the Image and Text Files Online” …”The Authors maintain, as they did below, that the Libraries have no right to copy and use millions of books without authorization or payment. If the Libraries want to scan print books in order to create indices or to facilitate text mining or other research tools, they should be required to ask for and obtain permission for their copying. But more importantly for purposes of this appeal, to the extent that any of the Libraries’ goals fit within the rubric of fair use, the Libraries should be permitted to do no more than is necessary to accomplish that particular purpose.” (Authors Guild Ap. Br. 32)

“Moreover, unlike HathiTrust’s perpetual storage of high resolution image files and text files of every book, the Web pages copied by a search engine are incidental to the search function.” (Authors Guild Ap. Br. page 33)

“[O]nce a book’s text is recorded in the index, the image and text files are no longer necessary for the operation of the search engine.” (Authors Guild Ap. Br. page 37)

“[E]ven if it is necessary to digitize an entire work in order to index the contents for facilitating search, the third factor weighs heavily against the Libraries because they are unnecessarily retaining complete image and text files comprising every page of every book.” (Authors Guild Ap. Br. page 36)

Some thoughts on the Multiple Copy Argument

It entirely plausible that a plaintiff might look at a defendant who has made lots and lots of copies and argue that the very multiplicity of the copying is evidence that the real purpose was not the transformative use claimed, but some other use. For example, if Borders (1971-2011) had scanned its whole inventory and made 60,000 copies of the collection in dvd bundles, we might have begun to suspect they were planning on selling them.

However, in the context of the library digitization being litigated in Authors Guild v. HathiTrust, there is no similar mystery about the extent of copying. The libraries maintain the original scan images because those images are needed to quality-check the OCR (optical character recognition) text versions. Those versions are also needed so that the collection can be re-digitizes when, inevitably, someone invents a smarter OCR program that is less prone to error. A biologist would not throw out an original specimen after taking their initial notes; a social scientist would not delete her original data after running her initial set of regressions. It would be somewhere between reckless and crazy to throw out the original scans.

The same applies to the OCR-text files. It might be true that once you create a search index you don’t need the original text files to actually implement search. But as anyone with any experience in software development or working with data will tell you, there are always new and better ways to process information. It would be hubris, almost a crime against knowledge, to pretend that search indexing or optical character recognition in 2013 are a good as they will ever be.

The Authors Guild Appeal Brief appears (to me) to be deliberately obtuse when it says “… even if this Court were to hold that HathiTrust in its current configuration satisfies these criteria, the Libraries still have not demonstrated their need to retain the digital image files in order to facilitate access to the print-disabled, as the assistive technology uses text files to convert the text from the book into speech.” (Authors Guild Ap. Br. page 38). Does the Authors Guild seriously intend that the print-disabled should be held hostage to state of the art in OCR and text-to-speech as of 2013?

Any library digitization exercise should generate a handful of copies per book – you have to keep the original image and OCR files safe; you have to duplicate them so people can examine them; you have to store everything in multiple locations in case of flood, fire, terrorist attack or simple human error, and if scientists are regularly testing new equations against the original data you might need to mirror some of that data to increase the speed of the network. There is no reason why the universities should treat these digitized files any more cavalierly than Facebook treats the 267 photos of my dog I have posted to the social network.

April 30, 2013April 30, 2013

The Authors Guild, orphan works and civil rights? (Authors Guild v Hathitrust pt. 3)

Introduction and Necessary Disclaimer

This one of a series of posts concerning the Authors Guild v. Hathitrust case, specifically these posts take the form of commentary on the Authors Guild Appeal Brief (February 25, 2013). Although I am one of the authors of the Digital Humanities and Law Scholars Amicus Brief, the views expressed on this site are purely my own. My comments on the Authors Guild’s Appeal Brief will not be comprehensive, rather, my aim is to review the aspects of the brief that I found interesting.

Today’s topic …

What is the Authors Guild really saying about orphan works?

In some ways, the Authors Guild is the victim of its own success. The Authors Guild was quick to discover some defects in the way that the University of Michigan was determining orphan works status when the project was first announced in 2011. Exposure of those issues led to the suspension of that project before any single work was distributed to the public as an orphan work. The orphan works project might come back in some form at some stage, but at the moment there is no way for the court to know what kind of orphan works project it was being asked to rule on or who it would effect.

In its appeal brief, the Guild responds to this predicament by arguing that the orphan works part of its case is ripe for adjudication because the details simply don’t matter – any orphan works project would be unlawful! See e.g.

“Any iteration of the OWP under which copyrighted works are made available for public view and download violates the Copyright Act. The pure legal question that was presented to the District Court is the same as it will always be: Is it ever lawful to take an entire copyright-protected book and make it widely available for display and download without permission?” (Authors Guild Ap. Br. page 13; see generally pages 13-14).

And later

“Plainly, existing copyright law does not permit the copying and distribution of the entirety of copyright-protected works to tens of thousands of users, irrespective of whether it might be difficult to locate the rights-holder.” (Authors Guild Ap. Br. page 17)

I don’t know how the defendants will respond to this argument and it is not an issue that fits within the scope of the Digital Humanities Amicus brief. Rather than diving into the legal arguments as to when and why the display of orphan works would be fair use, I thought it might be illuminating to consider an example.

Orphan works example: the Civil Rights Movement Veterans Website

On April 12, 2012, I attended the opening session of the Berkeley Law School’s “Orphan works and Mass Digitization” conference. The topic of the first panel was “Who wants to make use of orphan works and why.” In the course of that panel, Bruce Hartford, the webmaster of the Civil Rights Movement Veterans Website told a story so fascinating it is worth setting in full.

The Civil Rights Movement Veterans Website recounts the history of the civil rights movement:

“This website is created by Veterans of the Southern Freedom Movement (1951-1968). It is where we tell it like it was, the way we lived it, the way we saw it, the way we still see it. With a few minor exceptions, everything on this site was written, created, or spoken by Movement activists who were direct participants in the events they chronicle.” (http://www.crmvet.org)

Much of the material on the Civil Rights Movement Veterans website is used with permission or requires no permission because it is in the public domain. However, according to Hartford, that still leaves a significant proportion of material that he would classify as orphan works. When Hartford uses the term orphan works he means (i) material that was originally copyrighted by an organization which no longer exists and made no provision for its copyrights upon dissolution; (ii) material where the copyright owner cannot be found; (iii) or material where the identity of the copyright owner was always unknown.

The photo below of James Forman (October 4, 1928 – January 10, 2005), an American Civil Rights leader active in the Student Nonviolent Coordinating Committee.

As Hartford described it:

“The camera was smuggled into the jail, given to an unknown prisoner who clicked the button and took the picture. Under copyright law, as I am told, the copyright to the picture is owned by the unknown prisoner who pressed the button on the camera, who then gave it back to whoever smuggled the camera into the prison, to smuggle it out of the prison.

Now I know this is off topic, but I am just going to say, some of us are a little annoyed about this stupid rule that the person who presses the button totally owns the rights and those of us who are risking our lives to do whatever it was that they were taking the picture of have no say so in whatever happens to that and they can make lots of money on it and we can look and weep.”

Take another look

Take another look at the photo of James Forman, consider what it means to the Civil Rights Movement Veterans Website and ask yourself, can it really be true, as the Authors Guild state in their brief, that “[p]lainly, existing copyright law does not permit the copying and distribution of the entirety of copyright-protected works to tens of thousands of users, irrespective of whether it might be difficult to locate the rights-holder.” (Authors Guild Ap. Br. page 17)?

April 26, 2013

Not everything is the same as everything else – Authors Guild v Hathitrust (pt. 2)

Introduction and Necessary Disclaimer

Today’s topic …

Not everything is the same as everything else

Legal argument is art of analogizing and distinguishing, drawing out the implications of things already decided in ways that suggest the a favorable outcome for matters still in dispute. Thus, in copyright cases it is quite common to read that x (new thing) is the same as/totally different from y (old thing). The Authors Guild’s brief engages in quite a bit of this kind of argument, but mostly without saying so explicitly. In particular, their brief contains three examples of false equivalence that simply don’t add up.

The Authors Guild implicitly suggests that the defendants’ orphan works project is the same as the Authors Guild’s own proposal to deal with orphan works in Google Book Search Settlement. It isn’t.
The Authors Guild argues that the defendants’ orphan works project is a substitute for orphan works legislation. It isn’t.
The Authors Guild brief proceeds as thought library digitization were the same as library photocopying. It isn’t.

The Universities’ Orphan Works Project v. the Google Book Search Settlement

Most of the Authors Guild’s ink is spilt on the universities’ proposed orphan works project (OWP). The idea behind the defendants’ OWP appears to be that out-of-print books published in the U.S. between 1923 and 1963 should be made available for educational use if the rights holders cannot be reasonably be located. The University of Michigan proposed a method to automate the identification of orphan works for this purpose in 2011. However, the exact nature of this particular project is still yet to determined because after the Authors Guild filed suit against the HathiTrust et al, the University of Michigan announced that the OWP would be temporarily suspended. The University of Michigan candidly admitted that the procedures used to identify orphan works had allowed some works to make their way onto the Orphan Works Lists in error.

The Authors Guild Appeal Brief contains the implicit suggestion that the defendants’ OWP is the same as the audacious exploitation of orphan works that the Authors Guild itself proposed under its Settlement Agreement with Google.

It is true that, as noted at page 10 of the Guild’s Appeal Brief, “a mechanism to help resolve the orphan works issue was one of the key aspects of the attempted settlement of the Google Books case”.

It is also undeniable that Judge Chin commented “the establishment of a mechanism for exploiting unclaimed books is a matter better suited for Congress than this Court”. (Authors Guild v. Google, Inc., 770 F. Supp. 2d 666 (S.D.N.Y. 2011))

But Judge Chin was evaluating the fairness of the private settlement between Google and the Authors Guild, he was not commenting on the question of whether the display of any orphan works under any circumstance could be fair use, nor was he reviewing anything remotely like the libraries much more limited orphan works program.

The Authors Guild proceeds as though the modest orphan works program announced by the university defendants is the same in substance as the universal bookstore rejected by the Judge Chin in 2011. (See e.g., Authors Guild, page 10 “Unhappy with Judge Chin’s decision, [University of Michigan] decided to take the law into its own hands by unilaterally initiating its own program.”) This strikes me as false equivalence.

Under the default settings of the now defunct settlement (proposed 2008, amended 2009, rejected 2011) Google would have been allowed to display up to 20% of a non-fiction work to the entire world and to sell books through consumer purchases and institutional subscriptions. Funds from the sale of orphan works were to held by a ‘book rights registry’ for safe keeping and eventual distribution to worthy causes. [Under the original Settlement Agreement, the revenues attributable to orphan or unclaimed works would have flowed in part to the ‘book rights registry’ and in part to registered authors and publishers.]

The details of the OWP that the defendants may or may not eventually undertake are unclear, but their public statements indicate that any such project would be grounded on non-commercial, limited, educational use. Moreover, the settlement would have treated all books whose copyright owners who failed to notify the registry of their interests as orphan works, the University of Michigan is working on a method to reliably determine a much smaller subset of true orphan works.

Whatever it turns out to be, the Universities’ orphan works project will not be the same as the Authors Guild’s own proposal to deal with orphan works in Google Book Search Settlement.

The Universities’ Orphan Works Project v. Orphan Works Legislation

The Authors Guild Appeal Brief also conflates the universities’ OWP with various legislative solutions that have been proposed over the years in relation to the widely recognized orphan works problem. See for example Authors Guild Ap. Br. at page 15 “Despite clear indications by courts and the Copyright Office that the treatment of orphan works should be left to Congress, the Libraries insist that the OWP is legal.” (There is another example on page 10).

Does it really make sense that Congress’ failure to comprehensively or partially legislate a solution to the problem of orphan works means that the use of orphan works is never allowed under any circumstances, no matter how limited or irrespective of the reason? Congress could act to make out of print works universally available under terms similar to the Authors Guild’s proposal in the Google Book Search settlement, but so what? The mere fact that Congress could in theory set out a system that is broader than the limited scope for orphan works display that would be viable as fair use does not mean that there is no fair use.

Whatever it turns out to be, there is no basis to think that the university defendants’ orphan works project is a substitute for orphan works legislation.

Library Digitization v. Library Photocopying

If you proceed from the assumption that all unauthorized uses of a book are piracy then it makes sense that every new technology is just a new version of the photocopier. The Authors Guild Appeal Brief certainly can certainly be read as adopting the latter view.

The brief argues that “[t]he mechanical conversion of printed books into digital form is not transformative because it does not add any ‘new information, new aesthetics, [or] new insights and understandings,’ to the books.” (citing Pierre Leval, Toward a Fair Use Standard, 103 Harv. L. Rev. 1105, 1111 (1990).) True, there is solid authority that photocopying and cable retransmission are not per se transformative (i.e., without looking at the reasons), but to suggest that library digitization offers no new insights is unsustainable.

Library digitization raises several different issues depending on the purpose behind that digitization and the uses that are subsequently made of the digitized texts. Library digitization could be motivated by any or all of the following:

to preserve existing volumes
to facilitate text-mining, data analysis and digital searching of the contents of books
to facilitate access to electronic versions of books

The legal issues relating to each of these genres must be considered separately, but the Authors Guild’s brief muddles them altogether. Digitization does look a bit like other forms of copying if the motivating purpose is access or display of expressive works (i.e., #3 above). However, the argument in favor of a limited, non-commercial and education focused orphan works project turns not on transformative use, but on other considerations such as the lack of market harm [See Jennifer M. Urban, How Fair Use Can Help Solve the Orphan Works Problem (June 18, 2012)].

Likewise, the argument in favor of library digitization to facilitate disabled access is much broader than the details of the underlying technology. Whether we use the label transformative or not, this is clearly a favored purpose under the first fair use factor. The provision of equal access to copyrighted information for print-disabled individuals is mandated by the Americans with Disabilities Act (ADA). The HathiTrust provides print-disabled individuals with access to millions of items within library collections, whereas in the past they merely had access to a few thousand at best. “Making a copy of a copyrighted work for the convenience of a blind person is expressly identified by the House Committee Report as an example of a fair use, with no suggestion that anything more than a purpose to entertain or to inform need motivate the copying.” (Sony Corp. of Am. v. Universal City Studios, Inc, 464 U.S. 417, 455 n.40 (1984)).

The claim that library digitization is just like photocopying and does not offer any new insights crumbles completely when one considers the non-expressive uses such digitization makes possible. Library digitization makes it possible to extract meta-data from books and to create a useful search engine. Search indexing, text-mining and other computational uses of text could not be more different from mere photocopying; the “new information” and “new aesthetics” they offer include:

Text-based searching
Research on the structure of language
Research on the use of language.

The database as a whole serves a different purpose than each of the constituent works that have been scanned and indexed. The individual works provide content to readers, they convey the authors original expression. The database as a whole provides a means of searching for and identifying books or analyzing the language within books.

Labels like transformative use and nonexpressive use can be helpful in grouping like cases together, but they can also be distracting. The issue of fair use is directly tied to a purposive reading of the Copyright Act and the purpose of copyright is clearly articulated in the U.S. Constitution—“[t]o promote the Progress of Science and useful Arts. . . .” As the Supreme Court stated in Campbell, the “central purpose” of the fair use investigation is to see, “whether the new work merely supersedes the objects of the original creation, or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message…”

The plaintiffs argue that library digitization is utterly untransformative, but in fact, digitization enabling book search and text-mining clearly leads to “new information, new aesthetics, new insights and understandings.”

For example, as we explained in the Digital Humanities Amicus Brief:

“Google’s “Ngram” tool provides another example of a nonexpressive use enabled by mass digitization—this time easily visualized. Figure 1, below, is an Ngram-generated chart that compares the frequency with which authors of texts in the Google Book Search database refer to the United States as a single entity (“is”) as opposed to a collection of individual states (“are”).

As the chart illustrates, it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation. This is a trend with obvious political and historical significance, of interest to a wide range of scholars and even to the public at large. But this type of comparison is meaningful only to the extent that it uses as raw data a digitized archive of significant size and scope. To be absolutely clear, 1) the data used to produce this visualization can only be collected by digitizing the entire contents of the relevant books, and 2) not a single sentence of the underlying books has been reproduced in the finished product. In other words, this type of nonexpressive use only adds to our collective knowledge and understanding, without in any way replacing, damaging the value of, or interfering with the market for, the original works.”

Library digitization is not the same as library photocopying.

April 12, 2013April 12, 2013

The digital humanities is alive and well in South Bend, Indiana

I will be at Notre Dame on Friday, April 12, to give a lunchtime talk to the Working Group on Computational Methods in the Humanities and Sciences on copyright, text analysis, and the legal issues involved in digital humanities research. I’ll be speaking at an event organized by Assistant Professor Matthew Wilkens who works on contemporary fiction, literary theory, digital humanities, and social studies of science.

Copyright law is based on a set of rules developed in the 18^th Century to regulate the printing press. Today’s copyright law still carriers with it the legacy of print-era assumptions that have been profoundly disturbed by the digital economy. My talk will focus on the impact of successive waves of technology on copyright law and explain why the non-expressive use of copyrighted works by copy-reliant technologies presents a profoundly new issue for copyright law.

My interest in the digital humanities grew out of earlier work on Internet search engines and plagiarism detection software. Text mining software and other copy-reliant technologies do not read, understand, or enjoy copyrighted works, nor do they deliver these works directly to the public. They do, however, necessarily copy them in order to process them as grist for the mill, raw materials that feed various algorithms and indices.

Logistical details on the talk are available here and here.

November 19, 2012November 19, 2012

The Authors Guild Does Not Speak for Academic Authors

Academic authors are being asked to stand by an watch as the Authors Guild litigates against their wishes and interests, but supposedly on their behalf.

This hubris is not exactly unprecedented. The plaintiffs in Hansberry v. Lee 311 U.S. 32 (1940) sought to enforce a racially restrictive covenant on behalf of a broad class of landowners including African-American’s who would be harmed by enforcement and whites who simply objected. Like the land-owners in Hansberry many academic authors disagree with Authors Guild’s crusade against book digitization. The Supreme Court did not allow the plaintiffs to hijack the class in Hansberry, hopefully the Second Circuit will not allow the Authors Guild to do so in Authors Guild v. Google.

Pamela Samuelson and David Hansen (both of the University of California, Berkeley – School of Law) have filed a very important amicus brief on behalf of over 150 academic authors* in the Second Circuit Court of Appeals in Authors Guild v. Google. (Available on ssrn)

The brief in support of defendant-appellant Google argues that class certification should have been denied by the District Court because the named plaintiffs don’t represent the interests of academic authors who comprise a large proportion of the class.

The Authors Guild cloaks its lawsuit in the mantel of authorship, yet in reality it represents only a small fraction of the the class it has constructed. Most of the books that Google scanned from major research library collections were written by academics.

The basic problem is that the three individual plaintiffs who claim to be class representatives are not academics and do not share the commitment to broad access to knowledge that predominates among academics.

The plaintiffs’ request for an injunction to stop Google from making the Book Search corpus available would be harmful to academic author interests. The only way for the interests of academic authors to be vindicated in this litigation, given the positions that the plaintiffs have taken thus far, is for Google to prevail on its fair use defense and for the named plaintiffs to lose.

As we explained in the Digital Humanities Amicus Brief in the district court, “[m]ass digitization, like that employed by Google, is a key enabler of socially valuable computational and statistical research (often called “data mining” or “text mining”),” which allows researchers to discover and use the non-copyrightable facts and ideas that are contained within the collection of copyrighted works themselves.

The Authors Guild are bad representatives of the interests of academic authors because

Academic authors would generally prefer their books be findable using Google Book Search.
If the Authors Guild wins, academic authors will be deprived of a valuable resource, in the form of the Google Book Search Engine and the HathiTrust Digital Library.
If the Authors Guild wins, text mining — the most basic tool of the Digital Humanities — will have been declared to be prima facie illegal.

* I was one of the signatories.

October 20, 2012October 20, 2012

HathiTrust and the Future of Orphan Works

The U.S. Copyright Office is taking another look at the problem of orphan works under U.S. copyright law.

As the Copyright Office notice explains that the Copyright Office is “interested in what has changed in the legal and business environments during the past few years that might be relevant to a resolution of the problem and what additional legislative, regulatory, or voluntary solutions deserve deliberation.” Comments are due by 5:00 p.m. EST on January 4, 2013. Reply comments are due by 5:00 p.m. EST on February 4, 2013.

Assuming it is not reversed by the Second Circuit, does the HathiTrust win on October 10, 2012 take some of the urgency out of the orphan works issue? After all, digitization for non-expressive use such as text mining and building a search engine has now been confirmed as fair use. In addition, digitization in the service of expanding access for the print-disabled is also now clearly fair use.

Or, does the HathiTrust win simply set the stage for addressing general purpose expressive access to orphan works? The district court in HathiTrust did not reach the merits of the copyright claims with respect to the universities’ Orphan Works Project and gave very little signal how it would decide such an issue.