IPSC 2018 Slides from my talk on Legal Infrastructure for Text Data Mining

I presented my working paper on the legal infrastructure for text data mining at IPSC yesterday and I promised to post my slides. Here it is: Public, Matthew Sag, Legal Infrastructure for TDM (IPSC August 2018). I won’t be posting a draft online for a while because I want to get more feedback from people actually working in this area. But if you would like an advanced draft, please email me.

What is TDM?

Neglect, but only of my website

I have not posted here in a long time, but I am still alive. Partly I have been busy some long term projects and some things that don’t fit the copyright and tech focus of this website. My work on the copyright implications of text data mining has lead to a series of projects actually doing text data mining. This has been fun and has lead to new insights about the copyright issues that have dominated a lot of work for the last decade.

Check out my new website devoted to empirical analysis of Supreme Court oral arguments: ScotusOA.com.

The DMCA Safe Harbors With Brief Annotations of Important Cases

I made an annotated version of Section 512 of the Copyright Act — the DMCA Internet Safe Harbors — for my Copyright Law class and I thought that others might find it useful. My thanks to Annemarie Bridy (University of Idaho College of Law) for her helpful suggestions and additions.

Please note that this document an aid to understanding the DMCA safe harbors, it is not comprehensive, nor is it guaranteed to be free from error. Draft date: April 26, 2017.

Download link: The DMCA Safe Harbors With Brief Annotations of Important Cases



Defense Against the Dark Arts of Copyright Trolling is now forthcoming in the Iowa Law Review.

Defense Against the Dark Arts of Copyright Trolling

Jake Haskell and I have accepted an offer of publication at the Iowa Law Review. Iowa published my empirical study of copyright trolling in 2015, so it seems right to place a follow up piece there as well.

Defense Against the Dark Arts of Copyright Trolling is available now on ssrn (http://ssrn.com/abstract=2933200)

This will be my 4th publication in Iowa since 2015, the others being:

  • IP Litigation in US District Courts: 1994 to 2014, 101 IOWA LAW REVIEW 1065–1112 (2016)
  • Promoting Innovation, 100 IOWA LAW REVIEW 2223–2247 (2015) (with Spencer Weber Waller)
  • Copyright Trolling, An Empirical Study, 100 IOWA LAW REVIEW 1105-1146 (2015)

Is software expressive? Yes, but who cares? #IPSC16

My brief response to some comments at IPSC today re software functionality and expression.

Writing software obviously involves considerable human ingenuity, however, no one buys software to appreciate the expressive attributes of its source code. The difference between software and other forms of written communication can be demonstrated by asking the question, “what makes it good?”

For most works of authorship, there really is no consensus. However, computer scientists and software engineers will inevitably respond that good code is simple, readable, efficient, and well structured. No one says that software should be expressive, moving, that it should speak to the human condition or have emotional resonance. Software is primarily functional and good software is good because it functions well and does things that people want done.

#DMCA512 Comments and Keywords

I downloaded a selection of the longford comments re DMCA Section 512 and ran some basic word searches to try and organize the material. It would be great to someone do this on a more systematic basis! Even better if they tried some topic modeling.

At the end of the day, I am going to have to read most of these, but I am glad I don’t have to read them all!

Comments and key words (2) in Excel. 


My 2015 IP Litigation data is now available for replication

Most IP academics agree that their data should be available for replication, but we don’t always follow through in a timely fashion (or at all!). Yesterday I published some analysis of the latest filing data on copyright, patent and trademark litigation in US district courts. Today I published the underlying data on my website (under the publications+/data sets tab). There are two datasets, one derived from Pacer (excel) (stata) and one derived from Bloomberg (excel)(stata).

This data is an UPDATE to the data in my forthcoming article, IP Litigation in United States District Courts: 1994 to 2014 (Iowa Law Review, forthcoming 2016). The updated analysis is summarized in Matthew Sag, IP Litigation in United States District Courts2015 Update (January 5, 2016). Available at SSRN: http://ssrn.com/abstract=2711326.

Copyright Scholarship Roundtable

Screen Shot 2015-11-05 at 10.17.44 PM

The First Copyright Scholarship Roundtable will begin tomorrow at the University of Pennsylvania Law School. The event is sponsored by U. Penn’s Center for Technology, Innovation & Competition and was organized by Shyam Balganesh and myself (but mostly by Shyam).

I am the assigned commentator for two of the excellent papers: Ben Depoorter’s, Enforcing Against Norms and Kristelia Garcia’s, Facilitating Competition by Remedial Regulation. My slides for Enforcing Against Norms are posted here.

Some cool graphs from my paper on IP litigation in US district courts

I have just revised my article, IP Litigation in US District Courts: 1994 to 2014, which will be published in Volume 101 of the Iowa Law Review next year.  (You can download the article from ssrn now.) This post does not attempt to summarize the full article; it focuses instead on explaining some of the more interesting graphs and data visualizations in the article.

Copyright, Patent and Trademark Filings as a percentage of all IP 1994-2014

This data is presented as a 12 month moving average.

Copyright, Patent and Trademark Filings 1994—2014 (Percent)


Copyright, Patent and Trademark Filings (number of cases) 1994—2014

Again, this data is presented as a 12 month moving average. The difference between the dashed redline and the solid red line clearly shows the impact of lawsuits against anonymous internet file sharers.

Copyright, Patent and Trademark Filings 1994—2014 (Cases)


Copyright Cases 1994—2014, RIAA End-User Litigation, BitTorrent Monetization and Copyright Trolling

The impact of the current wave of copyright trolling is pretty clear.

Copyright Cases Filed in U.S. District Courts (1994—2014)


9 out of 10 of ‘copyright trolling’ cases are about pornography

As you can see from the table, the number of john does per suit has declined because courts have been far more skeptical of mass-joinder, but that has just led to more suits being filed.

Screen Shot 2015-08-20 at 11.03.56 AM


One pornography company accounts for 80% of Copyright John Doe lawsuits filed in 2014 #CopyrightTrolling

In fact, the pornography producer, Malibu Media is such a prolific litigant that in 2014 it was the plaintiff in over 41.5% of all copyright suits nationwide. John Doe litigation is not a general response to Internet piracy; it is a niche entrepreneurial activity in and of itself.

[Edited at 4:17pm. The missing * for AF Holdings has been added]


Screen Shot 2015-08-20 at 4.15.40 PM

1/2 The patent litigation explosion is not exactly as it appears, compare suits filed to #defendants.

At first glance it looks like the annual volume of patent litigation in the United States doubled in the 16 years from 1994 until 2010. In the three years from 2010 to 2013 it doubled again.

US Patent Litigation Filings, 1994–2014


2/2 The patent litigation explosion is not exactly as it appears, compare suits filed to #defendants.

The real trend in patent litigation over the past two decades can be seen in the number of defendants filed against. The bar chart at the bottom of the next figure shows the same filing data as in the figure above. The scatter plot in the figure below shows the estimated number of defendants. Although it appears that the number of patent cases filed exploded after 2010, looking at the estimated number of defendants, it becomes clear that the period from 2010 to 2013 was more or less a continuation of the existing trend.

Patent Cases Filed and Estimated Number of Defendants, 1994—2014

There is something wrong with the ED of Texas. Average Number of Patent Defendants per Filing 1994—2014

This figure shows the estimated number of defendants per suit for the nine most popular federal districts from 1994 to 2014 and also for an aggregation of all other districts. The vertical dashed line is set to 2011 to mark the passage of the America Invents Act. It is starkly apparent that the trend toward more defendants is greatest in the Eastern District of Texas. The estimated number of defendants in Eastern District of Texas climbs steeply from 1.66 in 1994 to 12.37 in 2010 and then drops precipitously down to 1.99 in 2014

Average Number of Patent Defendants per Filing 1994—2014


What does all this mean? To me, it suggests that there was not exactly a “Troll Fueled Patent Litigation Explosion” between 2010 and 2012. Once you take into account the procedural changes brought into effect in 2011 by the AIA and focus on the number of defendants rather the the number of suits it seem that there was a significant troll fueled increase in the rate of patent litigation; it is just that this increase started earlier and proceeded more smoothly than the simple case filing data suggests. I refer to this revised narrative as the Troll Fueled Patent Litigation Inflation.

District Rankings, Copyright Compared to Trademark (2010-2014)

This figure focuses your attention on the outliers, but the general story is that copyright and trademark litigation are highly correlated at a district court level.

District Rankings, Copyright Compared to Trademark (2010-2014)

Regional Variation in Patent Litigation – Evidence of Forum Selling

The popularity of the Eastern District of Texas as a forum for patent litigation is a well-known phenomenon. However, the data and analysis presented in this study provides a new way of looking at the astonishing ascendancy of this district and the problem of form shopping in patent law more generally. The extent of forum shopping in patent law can be seen by comparing the geographic distribution of patent litigation to that of copyright and trademark. This figure illustrates District rank in terms of patent versus a combined copyright and trademark ranking for cases filed between 2010 and 2014.

District Rank in terms of Patent versus Copyright and Trademark Combined (2010-2014)

District Court Ranks for Patent Litigation 1994-2014

This is crazy!

My paper explains how we got here and summarizes the excellent work of Jonas Anderson in a new paper titled ‘Court Competition for Patent Cases, and Daniel Klerman and Greg Reilly in ‘Forum Selling’ each of which go into even more detail.

District Court Ranks for Patent Litigation 1994-2014


The first thing to note about this figure is that, but for the Eastern District of Texas and Delaware, the geographic distribution of patent litigation over the past two decades would look remarkably stable. For most of this period, the Central District of California was the most important venue for patent litigation over the last 21 years, followed by the Northern District of California. The Northern District of Illinois has also ranked consistently somewhere between second and sixth over the same period. This relative stability contrasts markedly with the steady gains made by Delaware and the remarkable ascendancy of the Eastern District of Texas between 1994 and 2014. Notice that, were it not for the Eastern District of Texas, the scale on Figure 11 would range from 10 to 1, rather than 50 to 1. Framed accordingly, the steady ascent of Delaware from 9th in 1994 to 2nd from 2011 to the present day would be more noteworthy. However, the rise of the Eastern District of Texas from literal obscurity—it only saw 8 patent cases in 1994—to preeminence over the same period dwarfs all other changes.

Slides for my presentation on empirical studies of copyright litigation

Empirical Studies of Copyright Litigation. This presentation was part of the conference for the forthcoming Research Handbook – Economics of Intellectual Property Rights –
Volume II Empirical Studies. Northwestern University (August 5, 2015).

I hope to have a draft chapter posted to SSRN soon.

The literature surveyed is summarized below. As far as I know, this is all there is.

  • Barnes, Jeffrey Edward. “Comment: Attorney’s Fee Awards in Federal Copyright Litigation after Fogerty v. Fantasy: Defendants Are Winning Fees More Often, but the New Standard Still Favors Prevailing Plaintiffs.” In 47 UCLA L. Rev. 1381, 2000.
  • Beebe, Barton. “An Empirical Study of the Multifactor Tests for Trademark Infringement”, 94 CALIF. L. REV. 1581, 2006.
  • Beebe, Barton. “An Empirical Study of U.S. Copyright Fair Use Opinions, 1978-2005”, 156 U. Penn. L. Rev. 549, 2008.
  • Cotropia, Christopher A. and James Gibson. “Copyright’s Topography: An Empirical Study of Copyright Litigation.” 92 Texas Law Review 1981, 2014.
  • Ford, William K. “Judging Expertise in Copyright Law.” 14 J. Intell. Prop. L. 1, 2006.
  • Gerhardt, Deborah R. “Copyright Publication: An Empirical Study.” 87 Notre Dame L. Rev. 135, 2011.
  • Rogers, Eric. “Substantially Unfair: An Empirical Examination of Copyright Substantial Similarity Analysis among the Federal Circuits.” 2013 Mich. St. L. Rev. 893, 2013.
  • Landes, William M. “An Empirical Analysis of Intellectual Property Litigation: Some Preliminary Results.” 41 HOUS. L. REV. 749, 2004.
  • Lippman, Katherine. “The Beginning of the End: Preliminary Results of an Empirical Study of Copyright Substantial Similarity Opinions in the U.S. Circuit Courts”, 2013 Mich. St. L. Rev. 513, 2013.
  • Liu, Jiarui. “Copyright Injunctions After Ebay: An Empirical Study.” 16 Lewis & Clark L. Rev. 215, 2012.
  • Netanel, Neil Weinstock. “Making Sense of Fair Use.” 15 Lewis & Clark L. Rev. 715, 2011.
  • Nimmer, David. “Fairest of Them All and Other Fairy Tales of Fair Use.” 66 LAW & CONTEMP. PROBS. 263, 2003.
  • Priest, George L. & Benjamin Klein. “The Selection of Disputes for Litigation.” 13 J. LEGAL STUD. 1, 1984.
  • Sag, Matthew. “Predicting Fair Use”, 73 Ohio St. L.J. 47, 2012.
  • Sag, Matthew. “Empirical Studies of Copyright Litigation: Nature of Suit Coding 7.” Loyola Univ. Chi. Sch. of Law Pub. Law & Legal Theory, Research Paper No. 2013-017, 2013), available at http://ssrn.com/abstract=2330256, 2013.
  • Sag, Matthew. “Copyright Trolling, An Empirical Study.” In 100 Iowa L. Rev. 1105, 2015.
  • Sag, Matthew. “IP Litigation in United States District Courts: 1994 to 2014” Iowa Law Review, Forthcoming. Available at SSRN: http://ssrn.com/abstract=2570803, 2016.
  • Samuelson, Pamela. “Unbundling Fair Uses”, 77 Fordham L. Rev. 2537 (2009).