Brief of Digital Humanities and Law Scholars as Amici Curiae in Authors Guild v. Hathitrust

This Amicus Brief was filed in the United States Court of Appeal for the Second Circuit in the case of Authors Guild v. Hathitrust on June 4, 2013. The case is on Appeal from the United States District Court for the Southern District of New York, No. 11 CV 6351 (Baer, J.)

Download from SSRN.com (http://ssrn.com/abstract=2274832)

a brief summary

Amici are over 100 professors and scholars who teach, write, and research in computer science, the digital humanities, linguistics or law, and two associations that represent Digital Humanities scholars generally.

Mass digitization, especially by libraries, is a key enabler of socially valuable computational and statistical research (often called “data mining” or “text mining”). While the practice of data mining has been used for several decades in traditional scientific disciplines such as astrophysics and in social sciences like economics, it has only recently become technologically and economically feasible within the humanities. This has led to a revolution, dubbed “Digital Humanities,” ranging across subjects like literature and linguistics to history and philosophy. New scholarly endeavors enabled by Digital Humanities advancements are still in their infancy but have enormous potential to contribute to our collective understanding of the cultural, political, and economic relationships among various collections (or corpora) of works—including copyrighted works—and with society.

The Court’s ruling in this case on the legality of mass digitization could dramatically affect the future of work in the Digital Humanities. The Amici argue that the Court should affirm the decision of the district court below that library digitization for the purpose of text mining and similar non-expressive uses present no legally cognizable conflict with the statutory rights or interests of the copyright holders. Where, as here, the output of a database—i.e., the data it produces and displays—is noninfringing, this Court should find that the creation and operation of the database itself is likewise noninfringing. The copying required to convert paper library books into a searchable digital database is properly considered a “non-expressive use” because the works are copied for reasons unrelated to their protectable expressive qualities — the copies are intermediate and, as far as is relevant here, unread.

The mass digitization of books for text-mining purposes is a form of incidental or “intermediate” copying that enables ultimately non-expressive, non-infringing, and socially beneficial uses without unduly treading on any expressive—i.e., legally cognizable—uses of the works. The Court should find such copying to be fair use.