Matthew Sag – Page 13 – Writes about Copyright, AI, Machine learning, and Empirical legal studies

April 26, 2013

Not everything is the same as everything else – Authors Guild v Hathitrust (pt. 2)

Introduction and Necessary Disclaimer

This one of a series of posts concerning the Authors Guild v. Hathitrust case, specifically these posts take the form of commentary on the Authors Guild Appeal Brief (February 25, 2013). Although I am one of the authors of the Digital Humanities and Law Scholars Amicus Brief, the views expressed on this site are purely my own. My comments on the Authors Guild’s Appeal Brief will not be comprehensive, rather, my aim is to review the aspects of the brief that I found interesting.

Today’s topic …

Not everything is the same as everything else

Legal argument is art of analogizing and distinguishing, drawing out the implications of things already decided in ways that suggest the a favorable outcome for matters still in dispute. Thus, in copyright cases it is quite common to read that x (new thing) is the same as/totally different from y (old thing). The Authors Guild’s brief engages in quite a bit of this kind of argument, but mostly without saying so explicitly. In particular, their brief contains three examples of false equivalence that simply don’t add up.

The Authors Guild implicitly suggests that the defendants’ orphan works project is the same as the Authors Guild’s own proposal to deal with orphan works in Google Book Search Settlement. It isn’t.
The Authors Guild argues that the defendants’ orphan works project is a substitute for orphan works legislation. It isn’t.
The Authors Guild brief proceeds as thought library digitization were the same as library photocopying. It isn’t.

The Universities’ Orphan Works Project v. the Google Book Search Settlement

Most of the Authors Guild’s ink is spilt on the universities’ proposed orphan works project (OWP). The idea behind the defendants’ OWP appears to be that out-of-print books published in the U.S. between 1923 and 1963 should be made available for educational use if the rights holders cannot be reasonably be located. The University of Michigan proposed a method to automate the identification of orphan works for this purpose in 2011. However, the exact nature of this particular project is still yet to determined because after the Authors Guild filed suit against the HathiTrust et al, the University of Michigan announced that the OWP would be temporarily suspended. The University of Michigan candidly admitted that the procedures used to identify orphan works had allowed some works to make their way onto the Orphan Works Lists in error.

The Authors Guild Appeal Brief contains the implicit suggestion that the defendants’ OWP is the same as the audacious exploitation of orphan works that the Authors Guild itself proposed under its Settlement Agreement with Google.

It is true that, as noted at page 10 of the Guild’s Appeal Brief, “a mechanism to help resolve the orphan works issue was one of the key aspects of the attempted settlement of the Google Books case”.

It is also undeniable that Judge Chin commented “the establishment of a mechanism for exploiting unclaimed books is a matter better suited for Congress than this Court”. (Authors Guild v. Google, Inc., 770 F. Supp. 2d 666 (S.D.N.Y. 2011))

But Judge Chin was evaluating the fairness of the private settlement between Google and the Authors Guild, he was not commenting on the question of whether the display of any orphan works under any circumstance could be fair use, nor was he reviewing anything remotely like the libraries much more limited orphan works program.

The Authors Guild proceeds as though the modest orphan works program announced by the university defendants is the same in substance as the universal bookstore rejected by the Judge Chin in 2011. (See e.g., Authors Guild, page 10 “Unhappy with Judge Chin’s decision, [University of Michigan] decided to take the law into its own hands by unilaterally initiating its own program.”) This strikes me as false equivalence.

Under the default settings of the now defunct settlement (proposed 2008, amended 2009, rejected 2011) Google would have been allowed to display up to 20% of a non-fiction work to the entire world and to sell books through consumer purchases and institutional subscriptions. Funds from the sale of orphan works were to held by a ‘book rights registry’ for safe keeping and eventual distribution to worthy causes. [Under the original Settlement Agreement, the revenues attributable to orphan or unclaimed works would have flowed in part to the ‘book rights registry’ and in part to registered authors and publishers.]

The details of the OWP that the defendants may or may not eventually undertake are unclear, but their public statements indicate that any such project would be grounded on non-commercial, limited, educational use. Moreover, the settlement would have treated all books whose copyright owners who failed to notify the registry of their interests as orphan works, the University of Michigan is working on a method to reliably determine a much smaller subset of true orphan works.

Whatever it turns out to be, the Universities’ orphan works project will not be the same as the Authors Guild’s own proposal to deal with orphan works in Google Book Search Settlement.

The Universities’ Orphan Works Project v. Orphan Works Legislation

The Authors Guild Appeal Brief also conflates the universities’ OWP with various legislative solutions that have been proposed over the years in relation to the widely recognized orphan works problem. See for example Authors Guild Ap. Br. at page 15 “Despite clear indications by courts and the Copyright Office that the treatment of orphan works should be left to Congress, the Libraries insist that the OWP is legal.” (There is another example on page 10).

Does it really make sense that Congress’ failure to comprehensively or partially legislate a solution to the problem of orphan works means that the use of orphan works is never allowed under any circumstances, no matter how limited or irrespective of the reason? Congress could act to make out of print works universally available under terms similar to the Authors Guild’s proposal in the Google Book Search settlement, but so what? The mere fact that Congress could in theory set out a system that is broader than the limited scope for orphan works display that would be viable as fair use does not mean that there is no fair use.

Whatever it turns out to be, there is no basis to think that the university defendants’ orphan works project is a substitute for orphan works legislation.

Library Digitization v. Library Photocopying

If you proceed from the assumption that all unauthorized uses of a book are piracy then it makes sense that every new technology is just a new version of the photocopier. The Authors Guild Appeal Brief certainly can certainly be read as adopting the latter view.

The brief argues that “[t]he mechanical conversion of printed books into digital form is not transformative because it does not add any ‘new information, new aesthetics, [or] new insights and understandings,’ to the books.” (citing Pierre Leval, Toward a Fair Use Standard, 103 Harv. L. Rev. 1105, 1111 (1990).) True, there is solid authority that photocopying and cable retransmission are not per se transformative (i.e., without looking at the reasons), but to suggest that library digitization offers no new insights is unsustainable.

Library digitization raises several different issues depending on the purpose behind that digitization and the uses that are subsequently made of the digitized texts. Library digitization could be motivated by any or all of the following:

to preserve existing volumes
to facilitate text-mining, data analysis and digital searching of the contents of books
to facilitate access to electronic versions of books

The legal issues relating to each of these genres must be considered separately, but the Authors Guild’s brief muddles them altogether. Digitization does look a bit like other forms of copying if the motivating purpose is access or display of expressive works (i.e., #3 above). However, the argument in favor of a limited, non-commercial and education focused orphan works project turns not on transformative use, but on other considerations such as the lack of market harm [See Jennifer M. Urban, How Fair Use Can Help Solve the Orphan Works Problem (June 18, 2012)].

Likewise, the argument in favor of library digitization to facilitate disabled access is much broader than the details of the underlying technology. Whether we use the label transformative or not, this is clearly a favored purpose under the first fair use factor. The provision of equal access to copyrighted information for print-disabled individuals is mandated by the Americans with Disabilities Act (ADA). The HathiTrust provides print-disabled individuals with access to millions of items within library collections, whereas in the past they merely had access to a few thousand at best. “Making a copy of a copyrighted work for the convenience of a blind person is expressly identified by the House Committee Report as an example of a fair use, with no suggestion that anything more than a purpose to entertain or to inform need motivate the copying.” (Sony Corp. of Am. v. Universal City Studios, Inc, 464 U.S. 417, 455 n.40 (1984)).

The claim that library digitization is just like photocopying and does not offer any new insights crumbles completely when one considers the non-expressive uses such digitization makes possible. Library digitization makes it possible to extract meta-data from books and to create a useful search engine. Search indexing, text-mining and other computational uses of text could not be more different from mere photocopying; the “new information” and “new aesthetics” they offer include:

Text-based searching
Research on the structure of language
Research on the use of language.

The database as a whole serves a different purpose than each of the constituent works that have been scanned and indexed. The individual works provide content to readers, they convey the authors original expression. The database as a whole provides a means of searching for and identifying books or analyzing the language within books.

Labels like transformative use and nonexpressive use can be helpful in grouping like cases together, but they can also be distracting. The issue of fair use is directly tied to a purposive reading of the Copyright Act and the purpose of copyright is clearly articulated in the U.S. Constitution—“[t]o promote the Progress of Science and useful Arts. . . .” As the Supreme Court stated in Campbell, the “central purpose” of the fair use investigation is to see, “whether the new work merely supersedes the objects of the original creation, or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message…”

The plaintiffs argue that library digitization is utterly untransformative, but in fact, digitization enabling book search and text-mining clearly leads to “new information, new aesthetics, new insights and understandings.”

For example, as we explained in the Digital Humanities Amicus Brief:

“Google’s “Ngram” tool provides another example of a nonexpressive use enabled by mass digitization—this time easily visualized. Figure 1, below, is an Ngram-generated chart that compares the frequency with which authors of texts in the Google Book Search database refer to the United States as a single entity (“is”) as opposed to a collection of individual states (“are”).

As the chart illustrates, it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation. This is a trend with obvious political and historical significance, of interest to a wide range of scholars and even to the public at large. But this type of comparison is meaningful only to the extent that it uses as raw data a digitized archive of significant size and scope. To be absolutely clear, 1) the data used to produce this visualization can only be collected by digitizing the entire contents of the relevant books, and 2) not a single sentence of the underlying books has been reproduced in the finished product. In other words, this type of nonexpressive use only adds to our collective knowledge and understanding, without in any way replacing, damaging the value of, or interfering with the market for, the original works.”

Library digitization is not the same as library photocopying.

April 26, 2013

Some observations on the Authors Guild’s Appeal Brief in Authors Guild v. Hathitrust (Part 1)

Introduction and Necessary Disclaimer

This is the first in a series of posts concerning the Authors Guild v. Hathitrust case. Most of the posts will be commentary on the Authors Guild Appeal Brief (February 25, 2013). Although I am one of the authors of the Digital Humanities and Law Scholars Amicus Brief, the views expressed on this site are purely my own. My comments on the Authors Guild Appeal Brief will not be comprehensive, rather, my aim is to review the aspects of the brief that I found interesting.

Authors Guild v. Hathitrust – Essential Background

Chances are that if you are reading this blog, you are well aware that Google has been mired in copyright litigation regarding its library digitization project. Google was sued by the Authors Guild (among others) in a class action on behalf of all authors in 2005. A controversial settlement of that class action proposed in 2008 generated a maelstrom of objections. The settlement was revised in 2009, but ultimately rejected by Judge Deny Chin in the Southern District of New York in March 2011. Authors Guild v. Google is ongoing (the class action certification is being appealed by Google, if Google loses its appeal that case goes back to Judge Chin in the Southern District of New York).

In September 2011, the Authors Guild (among others) filed claims for copyright infringement against the universities of Michigan, California, Wisconsin, Indiana and Cornell University for participating in the Google Book project. The Guild’s complaint with respect to the universities is, first, that they allowed Google to digitize their library collections, second, that the universities accepted corresponding digital files from Google and have consolidated those files into a shared digital repository known as the HathiTust digital library, and third that the universities’ proposed orphan works project (OWP) amounts to copyright infringement.

This is speculation on my part, but the Authors Guild may have been banking on a favorable ruling from Judge Chin being handed down before their separate case against the universities went to judgment. If so, they miscalculated. (If not, I honestly can’t understand why they did not drop the suit against the HathiTrust – it is usually not a great idea to run the same legal argument against more sympathetic defendants when you have a choice. That said, I am sure that the plaintiffs were well advised and had sound reasons for their tactics – it is just had to see from the outside what those reasons might have been.)

Authors Guild v. Hathitrust moved fairly quickly to the summary judgment phase. Oral argument was held on August 6, 2012 in the United States District Court for the Southern District of New York in front of Judge Baer. On October 10, 2012, Judge Baer ruled against the plaintiffs and held that two key aspects of the library digitization program and the HathiTrust were “transformative” as that term of art is used in copyright cases and, on balance, fair use.

Judge Baer approved library digitization

to fulfill the requirements of the Americans with Disabilities Act by making suitable versions of books available to the visually impaired and
to engage in non-expressive uses such as text-mining and building a search engine.

The Judge also held that the domestic ‘Associational Plaintiffs’ (e.g. the Authors Guild and similar organizations) did not have statutory standing under the Copyright Act and that the claims involving the Universities’ OWP were not ripe for adjudication.

Understandably, the Authors Guild and their fellow plaintiffs are now pursuing their appeal rights. The next post takes a deeper look at the Authors Guild Appeal Brief.

April 17, 2013

Some thoughts on the correct pronunciation of Sag

My Hungarian grandparents Nick and Lily fled Hungary in 1939. They traveled on foot with my infant father to a port in Italy. Nick made a dangerous side-trip to Paris to get money to bribe his way onto a ship bound for Australia and to pay the landing money the Australian government required of jewish immigrants. I am proud of my grandparents and my extended family in Europe, the U.S. and Australia. Also, although I have never actually visited Hungary, I have a certain sentimental attachment to that country as well.

Nonetheless, I have decided to officially give up on the correct pronunciation of my family name. I don’t speak Hungarian, I can’t actually pronounce my name with a Hungarian accent. My closest American relative assures me that it should be pronounced ‘Sag’ with a long ‘a’ (á as in father) or you might imagine a british person to say saga.

After more than a decade of trying to tow the line this I have decided that the whole enterprise is futile and misguided. My attempts to get the world to adopt an Americanized Hungarian pronunciation have not been that successful. For example, I heard one of my friends massacre the “A” in Matt (sounded like mARt, to make it the same as the “A” in Sag.

Feel free to try any pronunciation of Sag that you like, but from now on my official policy is that, just as Matt rhymes with cat, Sag rhymes with bag.

Other famous Sag’s include: the Sag gene which encodes the S-arrestin protein in humans; the

Saudi Arabian Government; various
State Attorneys General; the
SQL Access Group and the
Screen Actors Guild.

Sâg is also a village in Sălaj County, Romania. I have no idea how they say it.

April 12, 2013April 12, 2013

The digital humanities is alive and well in South Bend, Indiana

I will be at Notre Dame on Friday, April 12, to give a lunchtime talk to the Working Group on Computational Methods in the Humanities and Sciences on copyright, text analysis, and the legal issues involved in digital humanities research. I’ll be speaking at an event organized by Assistant Professor Matthew Wilkens who works on contemporary fiction, literary theory, digital humanities, and social studies of science.

Copyright law is based on a set of rules developed in the 18^th Century to regulate the printing press. Today’s copyright law still carriers with it the legacy of print-era assumptions that have been profoundly disturbed by the digital economy. My talk will focus on the impact of successive waves of technology on copyright law and explain why the non-expressive use of copyrighted works by copy-reliant technologies presents a profoundly new issue for copyright law.

My interest in the digital humanities grew out of earlier work on Internet search engines and plagiarism detection software. Text mining software and other copy-reliant technologies do not read, understand, or enjoy copyrighted works, nor do they deliver these works directly to the public. They do, however, necessarily copy them in order to process them as grist for the mill, raw materials that feed various algorithms and indices.

Logistical details on the talk are available here and here.

April 2, 2013

Richard Stallman will be joining us at Loyola Chicago to discuss Patents, Innovation and the Freedom to Use Ideas. Should be interesting.

The Loyola law Journal has organized another great conference.

This one day conference will provide a forum for nationally recognized scholars and judges to discuss the trade-off between two interests of the public: the interest in development of new ideas and the interest in freedom to use ideas. The patent system is intended to serve the former, but imposes a cost on the latter. More specifically, the Conference will explore whether the added innovation achieved by the patent system justifies its cost to society, whether it operates within the Constitution’s requirements, whether improvements can be made, and whether a different system or no system at all might be preferred.

Richard Stallman will be giving a special address on “Questioning the Assumptions of the Patent System”

April 11, 2013.

More details are available at http://www.luc.edu/law/student/opportunities/law_journal_conference.html

April 2, 2013

Symposium on Copyright Law and Gray Market Goods, John Wiley & Sons v. Kirtsaeng

The DePaul Journal of Art, Technology & Intellectual Property Law is sponsoring a symposium on Copyright Law and Gray Market Goods, John Wiley & Sons v. Kirtsaeng on April 8, 2013 (12 – 3 p.m.)

In John Wiley & Sons, Inc. v. Kirtsaeng, 654 F.3d 210 (2d Cir. 2011), a publishing company brought an action against a defendant who was importing and selling textbooks within the United States. The defendant had relatives in Thailand purchase foreign editions of textbooks that were legally printed abroad. The relatives would send the textbooks to the defendant and the defendant would sell them for a profit. On appeal, the defendant argued that he should have been allowed to put forth a first sale defense.

The 2nd Circuit affirmed the district court’s rejection of a first sale defense based on a plain language interpretation of 17 USC § 602(a) and 17 U.S.C. § 109(a) and some dicta in Quality King Distributors, Inc. v. L’anza Research International, Inc., 523 U.S. 135 (1998). (Quality King involved goods that were manufactured within the United States, sold abroad and then re-imported). The Supreme Court granted certiorari. Oral arguments were heard on Oct. 29, 2012.

On March 19, 2013, Justice Breyer, writing for a majority of six, emphatically rejected the publisher’s control over the importation of legally manufactured “gray-market” products. The Court held that the “first sale” doctrine, which allows the owner of a copyrighted work to sell or otherwise dispose of that copy as he wishes, applies to copies of a copyrighted work lawfully made abroad. Justice Kagan filed a concurring opinion in which Justice Alito joined. Justice Ginsburg filed a dissenting opinion in which Justice Kennedy joined, and in which Justice Scalia joined except as to Parts III and V–B–1.

The slip opinion is available here.

Speakers

Professor Tyler Ochoa, Santa Clara University College of Law

Kevin Tottis, Principal, Law Offices of Kevin Tottis

Professor Matthew Sag, Loyola University School of Law

Robert Paul, Director of Business Operations, Compass Lexecon

Registration
For registration pricing and event details, please visit: jatipsymposium2013.eventbrite.com

March 26, 2013March 26, 2013

I am the University of Technology Sydney today to make friends with the robots and talk about copyright

I am a guest this morning at the University of Technology Sydney’s “Innovation and Technology Research Laboratory”, better known within UTS as The Magic Lab. The Magic Lab has a broad spectrum of research interests including robot soccer, humanoid robotics, belief revision, virtual worlds, cognitive marketing, collaboration, risk management, commonsense reasoning and technology-driven innovation in addition to strategic, social and legal aspects of innovation.

As part of my visit today I will be presenting to the Engineering and information technology department as part of their Leadership in Innovation Seminar series. My presentation will address the interaction of copyright law and digital technology.

March 8, 2013

Your TV is watching you – Where does it end?

News reports (extremetech) indicate that Microsoft has filed for a patent whereby

“The users consuming the content on a display device are monitored so that if the number of user-views licensed is exceeded, remedial action may be taken.”

No doubt George Orwell’s Telescreen (from the novel 1984) will be cited as prior art in opposition. The Telescreen allowed the Party to keep its subjects under constant surveillance thereby encouraging a climate of self surveillance. Replace ‘the Party’ with MPAA and you pretty much have it.

HT: Francis K – who brought this story to my attention.

March 5, 2013

Melbourne Law School Faculty Research Seminar Series

I will presenting my paper, Predicting Fair Use, at the Melbourne Law School Faculty Research Seminar Series on Monday 11 March 2013. The citation for the paper is Predicting Fair Use 73 Ohio State Law Journal 47– 91 (2012) (available at http://ssrn.com/abstract=1769130).

February 28, 2013

Slides for my Australian Digital Alliance Keynote address later today

You can now download the slide for my keynote presentation at the Australian Digital Alliance 2013 copyright forum, ‘Embracing the Digital Economy: creative copyright for a creative nation’.

click here: ADA Key Note (Online Display)