Matthew Sag – Matthew Sag

April 14, 2026April 14, 2026

Emory Law’s New AI & Law Concentration

Training Lawyers for a New Era

Starting in academic year 2026–27, Emory Law will offer a formal concentration in Artificial Intelligence and the Law — a structured academic pathway for J.D. students who want to develop real expertise in one of the most consequential areas of modern legal practice.

See this announcement for an overview.

How does the concentration work?

The concentration requires students to complete at least 12 credits drawn from three core areas: foundational courses at the intersection of law and AI, privacy and technology law, and intellectual property. Students can also supplement those courses with approved electives, internships, and externships. There’s no competitive application process — students simply need to satisfy the requirements and indicate their interest in their final semester. Those who do will have “Artificial Intelligence and the Law Concentration” listed directly on their transcript.

Which courses are currently available?

For more authoritative information on concentration and how it works, go to the Emory website. I wanted to highlight the relevant courses offered in the 2026-2027 academic year.

The following is subject to change, but to the best of my knowledge Emory Law students will be able to take the following courses relevant to the concentration:

AI Fundamentals

Current Issues in Law and AI (Fall, Prof. Matthew Sag)
AI and Legal Writing (offered in Fall and Spring)
Law and Film – AI and Law (Spring, Prof. Ifeoma Ajunwa)

AI and Intellectual Property

Copyright Law (Fall, Prof. Matthew Sag)
Intellectual Property (Spring, Professor Margo Bagley)

AI Privacy and Health Law

Privacy Law and AI (Fall, Prof. Ifeoma Ajunwa)
Genetics & the Law (Spring, Prof. Jessica Roberts)
Privacy Law (Spring, Adjunct Prof. Will Bracker)

Other

Fundamentals of Innovation I (Fall, Prof. Nicole Morris)
Fundamentals of Innovation II (Spring, Prof. Nicole Morris)

What should you take?

My advice is to think about what kind of lawyer you want to be and what kinds of clients that lawyer will work with, work backward from there to figure out what useful competencies and knowledge to build. For example, if you see yourself becoming in-house counsel at a technology company, you will also want a strong background in corporate law and antitrust, in addition to AI and IP courses. It’s also very helpful to know a little something about labor law and secure transactions.

I can’t really tell you what courses you should take, but I can tell you what I would take given my interests and the slightly unrealistic assumption that I was trying to meet all the concentration requirements in one academic year. I would take “Current Issues in Law and AI” (2 credits) and Copyright Law (3 credits) with me in the Fall; Genetics & the Law (3 credits) with Professor Roberts, and Privacy Law (3 credits) with Will Bracker in the Spring, plus AI and Legal Writing (2 credits).

Important note

This blog post is not the final authoritative word on the requirements of the concentration or the courses that will be available in the next academic year. The information presented here is meant to be helpful but not authoritative. It is definitely subject to change.

April 8, 2026

2026 Legal Scholars Roundtable on Artificial Intelligence

The 5th Annual Legal Scholars Roundtable on Artificial Intelligence at Emory University School of Law starts at Emory Law tomorrow. The Roundtable features a phenomenal lineup of authors, commentators and participants, including: Katrina Geddes, Andres Sawicki, Gabriel Weil, Jonathan Iwry, Nathan Reitinger, Bryan Choi, Jacob Noti-Victor, Annemarie Bridy, Charlotte Tschider, Michael Froomkin, David Rubenstein, Yonathan Arbel, Yinn-Ching Lu, Nikola Datzov, Christina Lee, Deven Desai, Chinmayi Sharma, Jessica Roberts, Jillian Grennan, Lawrence Nodine, Salwa Hoque, and Andrew Miller.

Papers will address deepfakes, strict liability for frontier AI, robots.txt and web scraping regulation, copyright litigation after generative AI, privacy law’s first principles in the AI age, AI culture wars and federalism, LLM courts, and AI agents’ shadow principals.

As always, this conference is made possible by Emory Law and Emory University’s AI.Humanity.

The roundtable is an invitation only event. But if you missed out this year we encourage you to apply next year.

March 23, 2026

David Kemp’s AI Policy Builder for Legal Education

David Kemp has just released a policy builder designed to help people who are struggling to design an AI policy that is relevant to their specific course.

According to the website, the policy content is “grounded in Sag, AI Policies for Law Schools (2025); Bliss, Teaching Law in the Age of Generative AI (2024); Perkins, AI Now: The Duty to Integrate AI Education in Law Schools (2024); Moppett, Preparing Students for the AI Era (2025); ABA Formal Opinion 512; and state bar guidance. All content is dedicated to the public domain under CC0 1.0 Universal.”

I have not tested this, but it seems like a fabulous idea. If you want to know more about my thoughts on building AI policies for law school classes, I have a paper on this topic on SSRN, the upshot of which is that effective AI policies must be course-specific, enforceable, and focused on teaching students to use AI responsibly as future legal professionals.

March 11, 2026

The Fallacy of Compression

This post is a very lightly edited extract from my forthcoming article in the Duke Law Journal, Copyright’s Jagged Frontier (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6319379)

What does AI memorization prove?

Some argue that any evidence of memorization necessarily negates the claim that AI models are transformative. They advance this claim by injecting the term “compression” into the conversation in a way that suggests that AI models like GPT, Claude, and Gemini, are compressed representations of their training data in the same way that an MP3 music file is a compressed version of music from a compact disc.

“[model training is] similar to what’s called lossy compression, which one way to describe it is if you have a giant file and you compress it into a ZIP file, you lose some of the contents of the work, but effectively you’re just actually compressing the file. … it’s actually taking the expressive content of the training data and compressing it down into a model. And that confirms that there’s no actual transformative use going on here … what the model is doing is actually just repeating over and over the training data over and over again.”

— Bartz v. Anthropic, Transcript of Motion for Summary Judgment Oral Argument, May 22, 2025., p44-45 (explaining Plaintiff’s expert’s view)

Alex Reisner (AI’s Memorization Crisis, The Atlantic), for example, draws on the Cooper and Ahmed studies, and argues that the evidence of memorization undermines the learning metaphor and reveals generative AI training for what it really is: “compression.” The upshot is, “Large language models don’t ‘learn’—they copy[.]” See also Ted Chaing‘s famous essay: ChatGPT Is a Blurry JPEG of the Web.

Technically accurate but thoroughly misleading

Associating AI training with compression is technically accurate if you understand the term the way computer scientists do; but it is also thoroughly misleading if you associate compression with MP3s, JPEGs, and Zip files, as most of us do.

AI models learn compact internal representations of their training data which capture whatever patterns that enable more accurate predictions. It is equally valid to label this process as “abstraction”, “learning”, “dimension reduction”, or “compression”; but the compression label invites analogy to familiar media formats such as MP3s and JPEGs.

These formats store approximations of original works that can later be reconstructed in forms that closely resemble their sources and are usually regarded as functionally indistinguishable. Other than hipsters with a taste for vinyl records, consumers interact with ZIP files, JPEGs, and MP3s as functionally equivalent to their uncompressed originals; whatever information is discarded is socially normalized as imperceptible. Side note, I highly recommend Jonathan Sterne, MP3: The Meaning of a Format (2012).

Calling it compression tells you nothing

Training an AI model is nothing like a ripping music into an MP3 format. Calling that process “compression,” tells you nothing about the level of detail of what is learned or the significance of the information discarded. The compression metaphor is further misleading because it implies uniformity and predictability. In conventional audio or image compression, the same categories of information are discarded from every file according to stable and transparent criteria that reflect advance judgments about what matters and what does not. By contrast, memorization in large language models is uneven, incidental, and difficult to anticipate. We know that memorization is more likely when a model is exposed to multiple copies of the same work, and that the timing of exposure during training can matter. Beyond such generalities, however, it is not possible to predict in advance which works will be retained verbatim or to what degree.

The rhetoric of compression is really just an effort to sidestep a difficult empirical question, rather than to answer it. The fact that one thing is memorized to a degree that seems relevant under copyright law doesn’t prove that everything is memorized to a similar degree.

To evaluate whether memorization actually has significance under copyright law requires some kind of qualitative and quantitative assessment of the nature and extent of memorization. But even that statement is overbroad, as I explain in Copyright’s Jagged Frontier, what actually matters in terms of a fair use analysis is not memorization in the abstract, but memorization that finds its way into production.

March 11, 2026

Copyright’s Jagged Frontier

Why the Line Between Legal and Infringing AI Won’t Be a Line at All

By Matthew James Sag

Everyone wants to know whether training AI on copyrighted works is legal. The real answer is: it depends—and the boundary between what’s permissible and what isn’t will be far messier than anyone expects.

In my forthcoming article in the Duke Law Journal, I argue that the copyright boundary for generative AI will be jagged rather than smooth. Not a clean bright line, but an irregular, context-dependent frontier shaped by the interaction of varying memorization rates across different AI models, divergent legal standards of similarity across different creative media, and the interplay of three distinct bodies of copyright doctrine (substantial similarity, fair use and secondary liability).

Understanding that jaggedness turns out to be essential—not just for predicting litigation outcomes, but for seeing the opportunities that lie on the other side.

The phrase “jagged frontier” will be familiar to many. It comes from the influential 2023 study by Fabrizio Dell’Acqua, Ethan Mollick, and colleagues, who used it to describe the uneven capability landscape of AI itself. It’s a useful concept because it captures the way that AI can be astonishingly good at some tasks while failing at others that seem equally difficult.

I borrow the metaphor deliberately, because copyright law presents generative AI with an analogous problem. The legal boundary between permissible and infringing AI conduct is similarly jagged: not because AI’s capabilities are uneven (though they are), but because the legal standards that determine infringement are themselves uneven across different creative domains. It seems likely that an AI system can cross the line into copyright infringement far more easily when generating music or images of recognizable characters than when generating prose—even when the underlying technology is essentially the same.

Explaining how and why the intersection of copyright and AI leads to a jagged frontier accounts for the first third of the article.

That jagged frontier is only the beginning of the story. Drawing on Ronald Coase’s insight that legal rules are starting points for adaptation and negotiation rather than final allocations, the Article argues that the extensive literature on AI and copyright has focused almost exclusively on fair use while ignoring what comes next. I might have something to say about that in a future post.

Matthew James Sag is the Jonas Robitscher Professor of Law in Artificial Intelligence, Machine Learning, and Data Science at Emory University School of Law. His article “Copyright’s Jagged Frontier” is forthcoming in the Duke Law Journal.

January 22, 2026

Legal Scholars Roundtable on Artificial Intelligence 2026 Call for Papers

Roundtable

Emory Law is proud to host the fifth annual Legal Scholars Roundtable on Artificial Intelligence. The Roundtable will take place on April 09-10, 2026, at Emory University in Atlanta, Georgia. The Legal Scholars Roundtable on Artificial Intelligence (AI) is designed to be a forum for the discussion of current legal scholarship on AI, covering a range of methodologies, topics, perspectives, and legal intersections.

Format
Participation at the Roundtable will be limited and invitation-only. Participants are expected to read all the papers in advance and be prepared to offer substantive comments. We will try to accommodate a limited number of Zoom-based participants in exceptional circumstances, but in person attendance is strongly preferred.

Applications to present, comment, or participate
We invite applications to participate, to comment, and/or to present from academics working on any topic relating to legal issues in AI. To request to present, you need to submit a substantially complete draft paper.

The deadline for submission is February 15, 2026, and decisions on participation will be made shortly thereafter, ideally, by March 1, 2026. If selected, final manuscripts are due April 1, 2026, to permit all participants an opportunity to read the papers prior to the conference.

To apply to participate, comment, or present, please fill out the google form:( https://forms.gle/h5Vqgj6xpDNSjfFDA).

What to expect from the Legal Scholars Roundtable on Artificial Intelligence
The Legal Scholars Roundtable on Artificial Intelligence is a forum for the discussion of current legal scholarship on AI, spanning a range of methodologies, topics, perspectives, and legal intersections. Authors who present at the Roundtable will be selected from a competitive application process, and commentators are assigned based on their expertise. Participants will have an opportunity to provide direct feedback in paper sessions and will have access to draft papers but will be asked not to post papers publicly or share without author permission. Robust sessions involve energetic feedback from other paper authors, commentators, and participants. Our goal is to ensure all authors have the full participation of all workshop participants in each author’s session.

Space is limited and we expect people to stay for the entire conference.

Essential logistics
The Roundtable will be held in person on the Emory campus in Atlanta, Georgia. The conference will begin on Thursday morning and run until 1PM on Friday. You can expect to be at the Atlanta airport by 1:45 PM, in time for a 2:30 PM flight or later on Friday. We will pay for your reasonable (economy) travel and accommodation expenses within the U.S. At the roundtable you will be well fed and caffeinated.

Organizers
Matthew Sag, Jonas Robitscher Professor of Law in Artificial Intelligence, Machine Learning, and Data Science at Emory University Law School (msag@emory.edu)
Charlotte Tschider, Professor of Law at Loyola Law Chicago (ctschider@luc.edu)

For more about the Legal Scholars Roundtable on Artificial Intelligence

December 11, 2025

The Mouse and the Model: the Disney-OpenAI Deal

The other shoe has finally dropped.

Today, December 11, 2025, OpenAI and Disney announced a partnership that essentially signals a marriage between generative AI and legacy media. Although some kind of deal was inevitable, the range and scope of this one are striking. Disney is sinking $1 billion into OpenAI for an equity stake and warrants, while simultaneously inking a three-year licensing deal.

The immediate result? OpenAI’s Sora and ChatGPT will legally ingest over 200 marquee characters from the Disney, Marvel, Pixar, and Star Wars vaults. We’ll see AI-generated Disney content on Disney+, and Disney employees will get enterprise-grade access to OpenAI’s tools. Notably, actor likenesses are off the table—a nod to the sensitivities of the recent labor strikes—but the direction of travel is clear. For more reporting, see the Verge.

Why it matters

Addressing the “Snoopy Problem“

AI companies and copyright industries are beginning to understand, and become reconciled to, the fact that neither side is going to score an absolute victory when it comes to the fair use issue for AI training. AI training that results in a model that learns from, but does not reproduce, the training data looks very likely to be upheld as fair use. Two recent cases held as much on summary judgement and this aligns with a line of precedent “nonexpressive use” cases that predate generative AI.

However, it’s becoming increasingly clear that it’s hard to train generative AI models to be really useful without some degree of memorization of the training data along the way. This is particularly problematic when it comes to copyrightable characters, because copyright protects characters more abstractly than most things. This is the well-known Snoopy problem (a term I coined in 2023).

Faced with this increasingly clear reality, it makes sense for consumer facing AI companies and entertainment Giants like Disney to think about licensing arrangements.

This deal signals a retreat from the fair use absolutism of early AI development. OpenAI and Disney have effectively priced the risk of memorization. Instead of spending the next decade in discovery arguing over pixel similarities, they are moving to a licensing regime. Disney gets paid and retains control; OpenAI gets legal certainty and the ability to serve the entertainment industry without looking over its shoulder.

Capital Crunch?

With competitors like Anthropic eyeing public listings, OpenAI’s decision to take strategic capital from a corporate giant like Disney may be telling. It suggests we are hitting a saturation point for traditional venture capital at the scale these foundation models require. It also hints that OpenAI sees more value in “smart money,” than in the volatility of the public markets. Disney isn’t just a piggy bank; it’s a hedge. By entangling itself with the world’s premier IP holder, OpenAI makes itself indispensable to the very industry that threatened to sue it out of existence. Or, I’m sure that’s the theory, whether it pans out that way remains to be seen.

The End of the Scaling Era?

Finally, this move also adds to the “Data Scarcity” thesis. The era of simply scraping the open web to make models smarter (2017–2025) might be over. The low-hanging fruit of the public internet has been picked, processed, recycled into synthetic data, and processed again, every which way you can imagine. To get better, and to stay ahead of open source rivals, companies like OpenAI are going to need access to data that no one else has. Google has YouTube; OpenAI now has the Magic Kingdom.

The Bottom Line

This is the template for the future. We are moving away from total war between AI and Content, toward a negotiated partition of the world. The tech companies provide the engine; the media giants provide the fuel. And for now, at least, both sides seem to think that’s a better outcome than leaving it up to a judge.

I wrote this blog post the morning the deal was announced, because it fits surprisingly well with a Law Review article I am writing, “The Snoopy Solution: How Fair Use and Licensing for Generative AI Can Coexist” based on a talk I gave at Yale last month.

November 21, 2025November 21, 2025

A handful of cherries does not make a sundae

Why content licensing cannot solve AI’s training-data problem

I have just published an article as part of the ProMarket symposium for the University of Chicago, the Booth School of Business, “The False Hope of Content Licensing at Internet Scale”

Although the article is not very long, I thought I would summarize my point even more briefly here.

AI developers have been on a shopping spree. Since mid-2023, OpenAI, Google, Anthropic and Meta have collectively spent hundreds of millions of dollars striking deals with publishers. OpenAI alone has inked agreements with everyone from the Associated Press to Condé Nast, gaining access to archives from The New Yorker, Vogue, The Wall Street Journal and dozens of other publications.

To many watching from the sidelines, these deals offer tantalizing proof that AI companies can—and should—pay for the content they consume.

However, the agreements grabbing headlines represent a tiny fraction of the data needed to train cutting-edge language models. Modern AI systems require trillions of diverse tokens scraped from across the internet—a scale and diversity that traditional licensing simply cannot reach.

To see why, read the full article: The False Hope of Content Licensing at Internet Scale

November 6, 2025

Copyright Winter is Coming (to Wikipedia?)

Judge Stein’s Order Denying OpenAI’s Motion to Dismiss in Authors Guild v. OpenAI, Inc., No. 25-md-3143 (SHS) (OTW) (S.D.N.Y. Oct. 27, 2025)

A new ruling in Authors Guild v. OpenAI has major implications for copyright law, well beyond artificial intelligence. On October 27, 2025, Judge Sidney Stein of the Southern District of New York denied OpenAI’s motion to dismiss claims that ChatGPT outputs infringed the rights of authors such as George R.R. Martin and David Baldacci. The opinion suggests that short summaries of popular works of fiction are very likely infringing (unless fair use comes to the rescue).

This is a fundamental assault on the idea, expression, distinction as applied to works of fiction. It places thousands of Wikipedia entries in the copyright crosshairs and suggests that any kind of summary or analysis of a work of fiction is presumptively infringing.

A white walker in a desolate field reading Wikipedia (an AI Image by Gemini)

Copyright and derivative works

In Penguin Random House LLC v. Colting, the Southern District of New York found that defendant’s “The Kinderguide” series, which condensed classic works of literature into children’s books, infringed the copyrights in the original works despite being marketed as educational tools for parents to introduce literature to young children.

Every year, I ask students in my copyright class why the children’s versions of classic novels in Colting were found to be infringing but a Wikipedia summary of the plots of those same books probably wouldn’t be. A recent ruling in the consolidated copyright cases against OpenAI means I might have to reconsider.

The ruling

On October 27, 2025, Judge Stein of the Southern District of New York denied OpenAI’s motion to dismiss the output-based copyright infringement claims brought by a class of authors including David Baldacci, George R.R. Martin, and others.

OpenAI had argued, reasonably enough, that the authors’ complaint failed to plausibly allege substantial similarity between any of their works and any of ChatGPT’s outputs. It is standard practice in copyright litigation to attach a copy of the plaintiff’s work and the allegedly infringing work, but the court held that “the outputs plaintiffs submitted along with their opposition to OpenAI’s motion were incorporated into the Consolidated Class Action Complaint by reference” and that it was enough that their Complaint repeatedly made “clear, definite and substantial references” to the outputs. Losing that civil procedure skirmish was probably a bad sign for OpenAI—a bit like the menacing prologue in A Game of Thrones, you sense that Copyright Winter is Coming .

Judge Stein then went on to evaluate one of the more detailed chat-GPT generated summaries relating to A Game of Thrones, the 694 page novel by George R. R. Martin which eventually became the famous HBO series of the same name. Even though this was only a motion to dismiss, where the cards are stacked against the defendant, I was surprised by how easily the judge could conclude that:

“A more discerning observer could easily conclude that this detailed summary is substantially similar to Martin’s original work, including because the summary conveys the overall tone and feel of the original work by parroting the plot, characters, and themes of the original.”

The judge described the ChatGPT summaries as:

“most certainly attempts at abridgment or condensation of some of the central copyrightable elements of the original works such as setting, plot, and characters”

He saw them as:

“conceptually similar to—although admittedly less detailed than—the plot summaries in Twin Peaks and in Penguin Random House LLC v. Colting, where the district court found that works that summarized in detail the plot, characters, and themes of original works were substantially similar to the original works.” (emphasis added).

To say that the less than 580-word GPT summary of A Game of Thrones is “less detailed” than the 128-page Welcome to Twin Peaks Guide in the Twin Peaks case, or the various children’s books based on famous works of literature in the Colting case, is a bit of an understatement.

The Wikipedia comparison

To see why the latest OpenAI ruling is so surprising, it helps to compare the ChatGPT summary of A Game of Thrones to the equivalent Wikipedia plot summary. I read them both so you don’t have to.

The ChatGPT summary of a Game of Thrones is about 580 words long and captures the essential narrative arc of the novel. It covers all three major storylines: the political intrigue in King’s Landing culminating in Ned Stark’s execution (spoiler alert), Jon Snow’s journey with the Night’s Watch at the Wall, and Daenerys Targaryen’s transformation from fearful bride (more on this shortly) to dragon mother across the Narrow Sea. In this regard, it is very much like the 800 word Wikipedia plot summary. Each summary presents the central conflict between the Starks and Lannisters, the revelation of Cersei and Jaime’s incestuous relationship, and the key plot points that set the larger series in motion.

I could say more about their similarities, but I’m concerned that if I explored the summaries in any greater detail, the Authors Guild might think that I am also infringing George R. R. Martin’s copyright, so I’ll move on to the minor differences.

The key difference between the Wikipedia summary and the GPT summary is structural. The Wikipedia summary takes a geographic approach, dividing the narrative into three distinct sections based on location: “In the Seven Kingdoms,” “On the Wall,” and “Across the Narrow Sea.” This structure mirrors the way the novel follows different characters in different locations, to the point where you begin to wonder whether these characters will ever meet. In contrast, the GPT summary follows a more analytical structure, beginning with contextual information about the setting and the series as a whole, then proceeding through sections that follow a roughly chronological progression through the major plot points.

There are some minor differences. The Wikipedia summary provides more granular plot details and clearer causal chains between events. It explains, for instance, how Catelyn’s arrest of Tyrion leads to Tywin’s retaliatory raids on the Riverlands, which in turn necessitates Robb’s strategic alliance with House Frey to secure a crucial bridge crossing. The Wikipedia summary also includes more secondary characters and subplots, such as Tyrion’s recruitment of Bronn as his champion in trial by combat, and Jon’s protection of Samwell Tarly.

The Wikipedia summary probably assumes a greater familiarity with the fantasy genre, whereas the GPT summary might be more helpful to the uninitiated. The GPT summary explains the significance of the long summer and impending winter and explicitly sets out the novel’s major themes.

In broad strokes, however, there is very little daylight between these two summaries. They are remarkably similar in what they include and in what they leave out. Most notably, both summaries sanitize Daenerys’s storyline by omitting the sexual violence that is fundamental to her character arc. This is particularly striking because sexual violence is central to Martin’s narrative in so many places and to the narrative arc of several of the main characters.

If GPT is substantially similar, so is Wikipedia

I don’t see how the ChatGPT summary could infringe the copyright in George R. R. Martin’s novel, if the Wikipedia summary doesn’t. A chilling prospect indeed, but I don’t think that either one is infringing.

It’s absolutely true that you can infringe the copyright in a novel by merely borrowing some of the key characters, plot points and settings, and spinning out a sequel or a prequel. In copyright, we call this a derivative work. But just because sequels and children’s versions of novels are often infringing, doesn’t mean that a dry and concise analytical summary of a novel is infringing.

Why not? It’s actually the act of taking those key structural elements, the skeleton of the novel if you like, and adding new flesh to them to create a new fully realized work that makes an unauthorized sequel infringing.

What’s at stake

Judge Stein’s order doesn’t resolve the authors’ claims, not by a long shot. And he was careful to point out that he was only considering the plausibility of the infringement allegation and not any potential fair use defenses. Nonetheless, I think this is a troubling decision that sets the bar on substantial similarity far too low.

The fact that “[w]hen prompted, ChatGPT can generate accurate summaries of books authored by plaintiffs and generate outlines for potential sequels to plaintiffs’ books” falls well short of demonstrating that such outputs by themselves would be regarded by the ordinary observer as substantially similar to a fully realized novel.

November 4, 2025

Do law schools need Harvey.AI?

Harvey.AI is following the playbook of Westlaw and Lexis by trying to establish itself as the go-to AI tool of choice for lawyers before they even become lawyers. I asked my university library to organize a Harvey demo so that we could think about joining the ranks of Stanford, UCLA, NYU, Notre Dame, WashU, Penn, UChicago, Boston University, Fordham, BYU, UGA, Villanova, Baylor, SMU, and Vanderbilt. (As reported by Above The Law) (https://abovethelaw.com/2025/10/harvey-snags-even-more-seats-in-the-t14).

This post is primarily based on a one-hour product demonstration given to us by a Harvey representative. To have a really well informed view on the product, I would want more hands-eye experience but there is surprisingly little information about what Harvey is actually offering online beyond the company’s own press releases. So, I thought my colleagues at other universities might find this assessment interesting.

TLDR

Meh, it’s OK, but law schools probably don’t need it and are probably only jumping on the bandwagon so that they can be part of the press release.

What is Harvey?

Harvey.AI is a legal-tech and professional services AI company whose flagship product is a generative AI assistant designed specifically for legal workflows used by law firms, in-house legal teams, and other professional services organizations. On its website, Harvey characterizes itself as “Professional Class AI” for leading professional service firms, emphasizing that its technology is domain-specific. In other words, it’s an AI system fine-tuned and optimized for legal and related professional work.

Use Cases and Contraindications

The first thing to understand about Harvey is that it is categorically not a legal research tool. Harvey essentially offers its clients a way of integrating generative AI into some routine drafting and analytical tasks that are quite common in legal practice.

Here are some common use scenarios:

If you have already identified the relevant case law and have a memo template to hand, Harvey AI can help you draft a legal research memo in double-quick time.

Alternatively, Harvey can help you review the key terms of a lengthy contract or almost any other synthesis or summarization task you could imagine.

Another good use case for the Harvey AI platform would be drafting an agreement or marking up the other side’s agreement in light of your own preferred templates. Harvey’s process for drafting from scratch seems directly analogous to vibe coding in software, but with a nice Microsoft Word integration.

You can also use Harvey for analysis and ideation (i.e., brainstorming). I can imagine coming to the end of a 3-month trial, throwing all the relevant documents into Harvey, and then launching into a discussion about closing argument strategy. Or, uploading a motion for summary judgment and the other side’s response, and then trying to anticipate the kinds of questions you might get from the bench.

The Harvey’s Value Proposition

You can already do almost all of this with ChatGPT, Gemini, Claude, and the like, subject to volume limitations on how many documents you upload. So, the natural question is, what value add does Harvey AI offer?

Fine tuning and model switching

One of the advantages claimed by Harvey is that rather than using foundation models like GPT directly, you would be engaging with custom versions of those model, fine-tuned on training data relevant to law and legal analysis. I could imagine that in some fields this would be a significant advantage, but I wonder how much of an advantage it is in the legal field given that most of that fine-tuning data is going to be public domain legal texts that are already well represented in the foundation models.

Another thing Harvey sees as a benefit is that they are not tied to any one model. They currently use three different fine-tuned foundation models, GPT, Gemini, and Claude, and they allocate tasks according to comparative advantage.

Security and confidentiality

By default, prompts and documents transmitted to a company like OpenAI may be used in training, will definitely be stored on OpenAI’s servers (at least for a while), and thus might be subject to discovery through appropriate legal processes. OpenAI has a setting where users can opt out of training that specifies that their data will only be retained for 30 days. This is probably good enough for many casual uses and even some mildly sensitive uses, but it’s obviously not enough for material that is subject to attorney-client privilege.

Accordingly, one of the key differentiators offered by Harvey AI is that the documents you upload and the prompts you write will not be accessible to Harvey or any third party, and that all of the information processing takes place in a secure Microsoft Azure environment with end-to-end encryption. This is probably the absolute minimum necessary to use LLMs for legal work. A large law firm could go one step further and actually host its own model in-house rather than relying on Microsoft. That extra layer of security might be required by some especially restrictive protective orders in litigation or by some especially sensitive clients. That sounds great, but I’m pretty sure I already get all that from Microsoft Copilot (although I would have to do a deep dive into the terms and conditions, Microsoft offers my university, to be sure).

Another nice feature of Harvey is that the client administrator can set permissions for individual users and for particular teams of users. This is critical in a corporate law environment where access to sensitive documents needs to be compartmentalized. It’s also critical if Harvey is being made available to students in a law school environment because students taking courses such as foundational Legal Writing and Research classes should probably not have access to Harvey AI.

Document Review (Retrieval-Augmented Generation)

Harvey AI has a good user interface for analyzing large volumes of documents. That is essentially an implementation of retrieval-augmented generation (RAG).

What’s RAG?

In very simple terms, RAG is an alternative to just answering a question through next-token prediction, relying on bulk context and whatever knowledge and understanding is latent in a foundation model. In a RAG process, the user query is translated into a document query. The document query identifies sections of documents that seem relevant to the query. Those sections are then collated and fed back into a general model which attempts to answer the question based on the specifically retrieved chunks of text. Platforms like ChatGPT are using a process like this any time you see them searching the web and providing links back to particular documents.

Harvey does RAG pretty well

RAG sounds like a great idea in theory. But whether it works in practice depends on how good the matching method is, which can vary a lot from context to context. In any RAG process, you will never know what relevant chunks of text were overlooked, and you won’t know whether the interpretive part of the model has drawn the appropriate inferences from the chunks it has retrieved unless you go back and check the original sources. One of the things I liked about the Harvey UX is that it made it easy to inspect the original document fragments and it had a clear process for checking off that these had actually been interrogated.

Example use cases would be looking for a change of control provisions in licensing agreements, as part of merger due diligence, or in document review for litigation. The Harvey representative we spoke to candidly admitted that the system performed really well in establishing a chronology, except in relation to emails. This makes sense, because an email thread contains lots of different dates all jumbled in together, but it is clearly a major limitation.

Prompting and training

Another value-add our representative stressed was prompting. Our representatives seem to be saying not only that Harvey would be running some thoughtfully-crafted prompts in the background, essentially running interference between user instructions and the models, but also that individual clients could do this for themselves. I can see why this might be an appealing feature to some people, but I’m not entirely convinced that making the steps in an analytical process obscure from the user is a good idea.

My Assessment

Generative AI as legal technology

Before we get into the specific pros and cons of Harvey, we need to consider the appropriate uses of generative AI as a legal technology more generally.

Many key deliverables in the legal field are in the form of text. But it’s relatively rare that the value of that text is entirely contained within the document itself. When a lawyer explains something to a client, they aren’t just helping their client understand something. They are also making a set of representations about the thought, diligence, and analysis that has gone into formulating that advice. Clients don’t just want text for its own sake, they want text you stand behind.

Accordingly, the most significant uses of generative AI in the legal field will be ones that accelerate a drafting-review or document-analysis process, as opposed to merely substituting for the underlying analysis.

Responsible use of generative AI in the legal field must be accompanied by either:

strong validation mechanisms (such as a process for clicking through the footnotes to confirm that the document in question really says what the model represented),
a knowing and well-informed acceptance of certain risks, or
the kind of external validation that a lawyer who is already familiar with the underlying materials intrinsically provides.

The validity questions that need to be answered before deploying generative AI as a legal technology are not limited to the problem of hallucinations in the narrow sense of invented cases, citations, and quotations.

Harvey claims to do very well in dealing with hallucinations, but it’s important to situate this in the context that Harvey is not a legal research tool. The kinds of tasks that Harvey says that its product should be used for are exactly the kind of tasks where one would expect a much lower instance of hallucinations. Why? Because they are mostly summary or translation tasks where the model has specific documents or templates to draw from. Even so, I’m a bit skeptical that the rate of hallucinations is really as low as Harvey claims.

The value proposition for law firms

Depending on the cost, I can see that Harvey would be a very attractive proposition for law firms of all sizes. Most of what Harvey offers can be replicated through an enterprise agreement with one of the main AI providers. Harvey offers a turnkey solution and a good user interface. You can think of it as ChatGPT in a black turtleneck, but that’s no bad thing.

Is it worth it? That depends on the cost, and the cost of the alternatives.

The value proposition for law schools

There is no doubt that most of our students are already using generative AI. It seems appropriate that we begin training them to do so properly and responsibly at the earliest opportunity. That said, the availability of generative AI to students taking specific skills courses could easily undermine the development of those skills. Rather than simply making Harvey available to all students, it makes sense to exclude first-year students and perhaps some upper-level skills courses. But obviously, we would want students in our Advanced Legal Writing course (where we are teaching AI skills) to have access to this tool.

If we decide that we don’t want students in our clinics using generative AI, then one of the major selling points of Harvey disappears. Our students don’t need the robust confidentiality protection that Harvey offers.

If Harvey is offering commercially reasonable terms, I still think it is an attractive proposition. But its value in legal education seems to me to be really quite limited. Our students are not conducting massive document review exercises or working with in-house templates. Most of the things students would find compelling about using Harvey, they can already do with Microsoft Co-Pilot, ChatGPT, Gemini, and Claude.