Trove and the Australian National Library’s risk management approach to orphan works

On Tuesday I posted The Authors Guild, orphan works and civil rights? (Authors Guild v Hathitrust pt. 3) in which I addressed the arguments made by on behalf of the plaintiffs on appeal in Authors Guild v. Hathitrust. The Authors Guild takes the rather extreme position that:

“Any iteration of the OWP [orphan works program] under which copyrighted works are made available for public view and download violates the Copyright Act.” (Authors Guild Ap. Br. page 13; see generally pages 13).

Their appeal brief poses the question:

“Is it ever lawful to take an entire copyright-protected book and make it widely available for display and download without permission?” (Authors Guild Ap. Br. page 13).

I believe that the answer to that question is YES. On Tuesday I gave an example of an individual orphan work made accessible on the  Civil Rights Movement Veterans Website. Today I want to extend that discussion to an entire orphan works program. 

Trove, The Australian National Library and Orphan Works

Trove is the Australian National Library’s primary vehicle to assist users to access digital content held by collecting institutions across Australia. Trove is used by tens of thousands of Australians every day.

In July 2008, Trove opened up Australian newspaper articles published from the 1800s to 1955 to full-text searching. Screen Shot 2013-05-01 at 11.00.24 AM

Screen Shot 2013-05-01 at 10.59.49 AM

Trove goes beyond the 1955 by agreement with newspaper publishers, but for anything prior to 1955 the NLA and the libraries in its network work on the assumption that there is no requirement to obtain permission. (See e.g.,  Selection Policy (http://www.nla.gov.au/content/selection-policy) “The newspapers must not have copyright restrictions i.e. anything before 1955 is suitable”).

An implicit orphan works policy

In selecting 1955 as the cut-off date, the NLA has adopted what they would call a sensible risk management policy and I would call an orphan works policy. Under Australian copyright law (Australian Copyright Act 1968) the date on which the copyright in a literary work expires depends on the date of publication and the date of the death of the author.

  • If a literary work was published in the lifetime of the author, and that author died before January1, 1957, the work is out of copyright.
  • Any literary work published in the lifetime of an author who died on or after January1, 1957 and before 2005, will be out of copyright 50 years after that author’s death.
  • If a work was first published anonymously and the identity of the author cannot be ascertained on reasonable inquiry, the period of copyright protection is measured from the year of publication and not the year of the author’s death. (See Section 34 of the Australian Copyright Act 1968).
  • The law relating to photographs in Australia is a little easier: any photograph taken before 1955 is in the public domain. (See Section 33 of the Australian Copyright Act 1968).

Newspapers contain works by many different authors. For each individual article in a newspaper, the period of copyright protection is measured from the death of the author, even if the author assigned the copyright to the publisher.

What does all this mean for library digitization?

In 2013 the odds are pretty good that anything published in, say 1950, is in the public domain in Australia. But, if a work was published in 1953 and the author died in 1973, then the copyright would not expire until 2023.

Before he retired in 2011 Warwick Cathro was the Assistant Director-General, Resource Sharing and Innovation at the NLA. Warwick was a pioneer in the delivery of innovative network services to the Australian library community and is considered the founder of Trove. I spoke to Warwick about the NLA’s approach to newspaper digitization and he said:

“The NLA thus took a “risk management” approach to copyright issues in its newspaper digitization program.

We did this because of the manifest public benefit in digitising this content. We never attempted to clear copyright in individual articles; how could we ever do this for tens of millions of articles?

To my knowledge, in the five years since this content has been made available online, not one copyright owner has objected. If any were to do so the NLA would discuss the purpose of its digitization program and seek permission to include the creator’s work in the newspaper database. If this could not be negotiated the NLA would take down the item or article in question.”

Of course, it is much easier to get your lawyers to sign off on this kind of sensible risk management approach that respects the wishes of authors and maximizes public access to knowledge in a jurisdiction without statutory damages.

Orphan works projects are not just a stalking horse for Silicon Valley internet companies, nor are they simply the whimsical playthings of obscure institutions happy to work in legal grey areas. Making orphan works available to the public should be one of the core missions of American libraries. Libraries could pursue this mission more easily if statutory damages were abolished and pragmatic risk management prevailed over the misinformed notion that the purpose of copyright is prevent unauthorized use solely for the sake of prevention.

The Authors Guild, orphan works and civil rights? (Authors Guild v Hathitrust pt. 3)

Introduction and Necessary Disclaimer 

This one of a series of posts concerning the Authors Guild v. Hathitrust case, specifically these posts take the form of commentary on the Authors Guild Appeal Brief (February 25, 2013). Although I am one of the authors of the Digital Humanities and Law Scholars Amicus Brief, the views expressed on this site are purely my own. My comments on the Authors Guild’s Appeal Brief will not be comprehensive, rather, my aim is to review the aspects of the brief that I found interesting.

Today’s topic …

What is the Authors Guild really saying about orphan works?

In some ways, the Authors Guild is the victim of its own success. The Authors Guild was quick to discover some defects in the way that the University of Michigan was determining orphan works status when the project was first announced in 2011. Exposure of those issues led to the suspension of that project before any single work was distributed to the public as an orphan work. The orphan works project might come back in some form at some stage, but at the moment there is no way for the court to know what kind of orphan works project it was being asked to rule on or who it would effect.

In its appeal brief, the Guild responds to this predicament by arguing that the orphan works part of its case is ripe for adjudication because the details simply don’t matter – any orphan works project would be unlawful! See e.g.

“Any iteration of the OWP under which copyrighted works are made available for public view and download violates the Copyright Act. The pure legal question that was presented to the District Court is the same as it will always be: Is it ever lawful to take an entire copyright-protected book and make it widely available for display and download without permission?” (Authors Guild Ap. Br. page 13; see generally pages 13-14).

And later

“Plainly, existing copyright law does not permit the copying and distribution of the entirety of copyright-protected works to tens of thousands of users, irrespective of whether it might be difficult to locate the rights-holder.” (Authors Guild Ap. Br. page 17)

I don’t know how the defendants will respond to this argument and it is not an issue that fits within the scope of the Digital Humanities Amicus brief. Rather than diving into the legal arguments as to when and why the display of orphan works would be fair use, I thought it might be illuminating to consider an example.

Orphan works example: the Civil Rights Movement Veterans Website

On April 12, 2012, I attended the opening session of the Berkeley Law School’s “Orphan works and Mass Digitization” conference. The topic of the first panel was “Who wants to make use of orphan works and why.” In the course of that panel, Bruce Hartford, the webmaster of the Civil Rights Movement Veterans Website told a story so fascinating it is worth setting in full.

The Civil Rights Movement Veterans Website recounts the history of the civil rights movement:

“This website is created by Veterans of the Southern Freedom Movement (1951-1968). It is where we tell it like it was, the way we lived it, the way we saw it, the way we still see it. With a few minor exceptions, everything on this site was written, created, or spoken by Movement activists who were direct participants in the events they chronicle.” (http://www.crmvet.org)

Much of the material on the Civil Rights Movement Veterans website is used with permission or requires no permission because it is in the public domain. However, according to Hartford, that still leaves a significant proportion of material that he would classify as orphan works. When Hartford uses the term orphan works he means (i) material that was originally copyrighted by an organization which no longer exists and made no provision for its copyrights upon dissolution; (ii) material where the copyright owner cannot be found; (iii) or material where the identity of the copyright owner was always unknown.

The photo below of James Forman (October 4, 1928 – January 10, 2005), an American Civil Rights leader active in the Student Nonviolent Coordinating Committee.

foreman copy

As Hartford described it:

“The camera was smuggled into the jail, given to an unknown prisoner who clicked the button and took the picture. Under copyright law, as I am told, the copyright to the picture is owned by the unknown prisoner who pressed the button on the camera, who then gave it back to whoever smuggled the camera into the prison, to smuggle it out of the prison.

 

Now I know this is off topic, but I am just going to say, some of us are a little annoyed about this stupid rule that the person who presses the button totally owns the rights and those of us who are risking our lives to do whatever it was that they were taking the picture of have no say so in whatever happens to that and they can make lots of money on it and we can look and weep.”

Take another look

Take another look at the photo of James Forman, consider what it means to the Civil Rights Movement Veterans Website and ask yourself, can it really be true, as the Authors Guild state in their brief, that “[p]lainly, existing copyright law does not permit the copying and distribution of the entirety of copyright-protected works to tens of thousands of users, irrespective of whether it might be difficult to locate the rights-holder.” (Authors Guild Ap. Br. page 17)?

Not everything is the same as everything else – Authors Guild v Hathitrust (pt. 2)

Introduction and Necessary Disclaimer 

This one of a series of posts concerning the Authors Guild v. Hathitrust case, specifically these posts take the form of commentary on the Authors Guild Appeal Brief (February 25, 2013). Although I am one of the authors of the Digital Humanities and Law Scholars Amicus Brief, the views expressed on this site are purely my own. My comments on the Authors Guild’s Appeal Brief will not be comprehensive, rather, my aim is to review the aspects of the brief that I found interesting.

Today’s topic …

Not everything is the same as everything else 

Legal argument is art of analogizing and distinguishing, drawing out the implications of things already decided in ways that suggest the a favorable outcome for matters still in dispute. Thus, in copyright cases it is quite common to read that x (new thing) is the same as/totally different from y (old thing). The Authors Guild’s brief engages in quite a bit of this kind of argument, but mostly without saying so explicitly. In particular, their brief contains three examples of false equivalence that simply don’t add up.

  1. The Authors Guild implicitly suggests that the defendants’ orphan works project is the same as the Authors Guild’s own proposal to deal with orphan works in Google Book Search Settlement. It isn’t.
  2. The Authors Guild argues that the defendants’ orphan works project is a substitute for orphan works legislation. It isn’t.
  3. The Authors Guild brief proceeds as thought library digitization were the same as library photocopying. It isn’t.

The Universities’ Orphan Works Project v. the Google Book Search Settlement

Most of the Authors Guild’s ink is spilt on the universities’ proposed orphan works project (OWP). The idea behind the defendants’ OWP appears to be that out-of-print books published in the U.S. between 1923 and 1963 should be made available for educational use if the rights holders cannot be reasonably be located. The University of Michigan proposed a method to automate the identification of orphan works for this purpose in 2011. However, the exact nature of this particular project is still yet to determined because after the Authors Guild filed suit against the HathiTrust et al, the University of Michigan announced that the OWP would be temporarily suspended. The University of Michigan candidly admitted that the procedures used to identify orphan works had allowed some works to make their way onto the Orphan Works Lists in error.

The Authors Guild Appeal Brief contains the implicit suggestion that the defendants’ OWP is the same as the audacious exploitation of orphan works that the Authors Guild itself proposed under its Settlement Agreement with Google.

It is true that, as noted at page 10 of the Guild’s Appeal Brief, “a mechanism to help resolve the orphan works issue was one of the key aspects of the attempted settlement of the Google Books case”.

It is also undeniable that Judge Chin commented “the establishment of a mechanism for exploiting unclaimed books is a matter better suited for Congress than this Court”. (Authors Guild v. Google, Inc., 770 F. Supp. 2d 666 (S.D.N.Y. 2011))

But Judge Chin was evaluating the fairness of the private settlement between Google and the Authors Guild, he was not commenting on the question of whether the display of any orphan works under any circumstance could be fair use, nor was he reviewing anything remotely like the libraries much more limited orphan works program.

The Authors Guild proceeds as though the modest orphan works program announced by the university defendants is the same in substance as the universal bookstore rejected by the Judge Chin in 2011. (See e.g., Authors Guild, page 10 “Unhappy with Judge Chin’s decision, [University of Michigan] decided to take the law into its own hands by unilaterally initiating its own program.”) This strikes me as false equivalence.

Under the default settings of the now defunct settlement (proposed 2008, amended 2009, rejected 2011) Google would have been allowed to display up to 20% of a non-fiction work to the entire world and to sell books through consumer purchases and institutional subscriptions. Funds from the sale of orphan works were to held by a ‘book rights registry’ for safe keeping and eventual distribution to worthy causes. [Under the original Settlement Agreement, the revenues attributable to orphan or unclaimed works would have flowed in part to the ‘book rights registry’ and in part to registered authors and publishers.]

The details of the OWP that the defendants may or may not eventually undertake are unclear, but their public statements indicate that any such project would be grounded on non-commercial, limited, educational use. Moreover, the settlement would have treated all books whose copyright owners who failed to notify the registry of their interests as orphan works, the University of Michigan is working on a method to reliably determine a much smaller subset of true orphan works.

Whatever it turns out to be, the Universities’ orphan works project will not be the same as the Authors Guild’s own proposal to deal with orphan works in Google Book Search Settlement.

The Universities’ Orphan Works Project v. Orphan Works Legislation

The Authors Guild Appeal Brief also conflates the universities’ OWP with various legislative solutions that have been proposed over the years in relation to the widely recognized orphan works problem. See for example Authors Guild Ap. Br. at page 15 “Despite clear indications by courts and the Copyright Office that the treatment of orphan works should be left to Congress, the Libraries insist that the OWP is legal.” (There is another example on page 10).

Does it really make sense that Congress’ failure to comprehensively or partially legislate a solution to the problem of orphan works means that the use of orphan works is never allowed under any circumstances, no matter how limited or irrespective of the reason? Congress could act to make out of print works universally available under terms similar to the Authors Guild’s proposal in the Google Book Search settlement, but so what? The mere fact that Congress could in theory set out a system that is broader than the limited scope for orphan works display that would be viable as fair use does not mean that there is no fair use.

Whatever it turns out to be, there is no basis to think that the university defendants’ orphan works project is a substitute for orphan works legislation.

Library Digitization v. Library Photocopying

If you proceed from the assumption that all unauthorized uses of a book are piracy then it makes sense that every new technology is just a new version of the photocopier. The Authors Guild Appeal Brief certainly can certainly be read as adopting the latter view.

The brief argues that “[t]he mechanical conversion of printed books into digital form is not transformative because it does not add any ‘new information, new aesthetics, [or] new insights and understandings,’ to the books.” (citing Pierre Leval, Toward a Fair Use Standard, 103 Harv. L. Rev. 1105, 1111 (1990).) True, there is solid authority that photocopying and cable retransmission are not per se transformative (i.e., without looking at the reasons), but to suggest that library digitization offers no new insights is unsustainable.

Library digitization raises several different issues depending on the purpose behind that digitization and the uses that are subsequently made of the digitized texts. Library digitization could be motivated by any or all of the following:

  1. to preserve existing volumes
  2. to facilitate text-mining, data analysis and digital searching of the contents of books
  3. to facilitate access to electronic versions of books

The legal issues relating to each of these genres must be considered separately, but the Authors Guild’s brief muddles them altogether. Digitization does look a bit like other forms of copying if the motivating purpose is access or display of expressive works (i.e., #3 above). However, the argument in favor of a limited, non-commercial and education focused orphan works project turns not on transformative use, but on other considerations such as the lack of market harm [See Jennifer M. Urban, How Fair Use Can Help Solve the Orphan Works Problem (June 18, 2012)].

Likewise, the argument in favor of library digitization to facilitate disabled access is much broader than the details of the underlying technology. Whether we use the label transformative or not, this is clearly a favored purpose under the first fair use factor. The provision of equal access to copyrighted information for print-disabled individuals is mandated by the Americans with Disabilities Act (ADA). The HathiTrust provides print-disabled individuals with access to millions of items within library collections, whereas in the past they merely had access to a few thousand at best. “Making a copy of a copyrighted work for the convenience of a blind person is expressly identified by the House Committee Report as an example of a fair use, with no suggestion that anything more than a purpose to entertain or to inform need motivate the copying.” (Sony Corp. of Am. v. Universal City Studios, Inc, 464 U.S. 417, 455 n.40 (1984)).

The claim that library digitization is just like photocopying and does not offer any new insights crumbles completely when one considers the non-expressive uses such digitization makes possible. Library digitization makes it possible to extract meta-data from books and to create a useful search engine. Search indexing, text-mining and other computational uses of text could not be more different from mere photocopying; the “new information” and “new aesthetics” they offer include:

  • Text-based searching
  • Research on the structure of language
  • Research on the use of language.

The database as a whole serves a different purpose than each of the constituent works that have been scanned and indexed. The individual works provide content to readers, they convey the authors original expression. The database as a whole provides a means of searching for and identifying books or analyzing the language within books.

Labels like transformative use and nonexpressive use can be helpful in grouping like cases together, but they can also be distracting. The issue of fair use is directly tied to a purposive reading of the Copyright Act and the purpose of copyright is clearly articulated in the U.S. Constitution—“[t]o promote the Progress of Science and useful Arts. . . .”  As the Supreme Court stated in Campbell, the “central purpose” of the fair use investigation is to see, “whether the new work merely supersedes the objects of the original creation, or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message…”

The plaintiffs argue that library digitization is utterly untransformative, but in fact, digitization enabling book search and text-mining clearly leads to “new information, new aesthetics, new insights and understandings.”

For example, as we explained in the Digital Humanities Amicus Brief:

“Google’s “Ngram” tool provides another example of a nonexpressive use enabled by mass digitization—this time easily visualized. Figure 1, below, is an Ngram-generated chart that compares the frequency with which authors of texts in the Google Book Search database refer to the United States as a single entity (“is”) as opposed to a collection of individual states (“are”).

is_are_take2

As the chart illustrates, it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation.  This is a trend with obvious political and historical significance, of interest to a wide range of scholars and even to the public at large.  But this type of comparison is meaningful only to the extent that it uses as raw data a digitized archive of significant size and scope. To be absolutely clear, 1) the data used to produce this visualization can only be collected by digitizing the entire contents of the relevant books, and 2) not a single sentence of the underlying books has been reproduced in the finished product. In other words, this type of nonexpressive use only adds to our collective knowledge and understanding, without in any way replacing, damaging the value of, or interfering with the market for, the original works.”

Library digitization is not the same as library photocopying.

HathiTrust and the Future of Orphan Works

The U.S. Copyright Office is taking another look at the problem of orphan works under U.S. copyright law.

As the Copyright Office notice explains that the Copyright Office is “interested in what has changed in the legal and business environments during the past few years that might be relevant to a resolution of the problem and what additional legislative, regulatory, or voluntary solutions deserve deliberation.” Comments are due by 5:00 p.m. EST on January 4, 2013. Reply comments are due by 5:00 p.m. EST on February 4, 2013.

Assuming it is not reversed by the Second Circuit, does the HathiTrust win on October 10, 2012 take some of the urgency out of the orphan works issue? After all, digitization for non-expressive use such as text mining and building a search engine has now been confirmed as fair use. In addition, digitization in the service of expanding access for the print-disabled is also now clearly fair use.

Or, does the HathiTrust win simply set the stage for addressing general purpose expressive access to orphan works? The district court in HathiTrust did not reach the merits of the copyright claims with respect to the universities’ Orphan Works Project and gave very little signal how it would decide such an issue.

Great panel at IPSC on orphan works, library digitization and fair use

In “The Orphans, the Market, and the Copyright DogmaAriel Katz notes that extended collective licensing (ECL) proposals will do nothing to solve the underlying orphan works problem. Like “Indulgences” ECL solutions merely absolves the “sin” of using works without permission, but actually does nothing to pay the absent owners.

In “How Fair Use Can Help Solve the Orphan Works ProblemJennifer Urban does a great job of explaining how the rest of us have under-analyzed the second fair use factor in relation to library digitization. She points out that in the Senate Report on the 1976 Copyright Act they say directly that market availability is part of the nature of the work.

In my own paper “Orphan Works as Grist for the Data Mill” I explain why copyright does not stand in the way of nonexpressive uses. My argument is that just as the distinction between expressive and nonexpressive works is well recognized. The same distinction should generally be made in relation to potential acts of infringement.

Copying for purely nonexpressive purposes, such as the automated extraction of data, should not be regarded as infringing.  Automated reproduction for nonexpressive uses (such as search engines, plagiarism detection, and macro-literary analysis) does not communicate the author’s original expression to the public, there is no expressive substitution, and thus there is no infringement. For more on Copyright and Copy-Reliant Technology, read my 2009 article of the same name.