Citation: Tonja Jacobi & Matthew Sag, We are the AI Problem, 74 Emory L. J. Online 1 (2024) [Download from Emory Law Journal]
In a nutshell, We are the AI Problem argues that the biased and sometimes jarring outputs of AI image generators reflect deeper societal problems rather than technological failures, with AI serving as an uncomfortable mirror that exposes our historical inequalities and conflicting aspirations about diversity and representation.
A collection of images produced by Google Gemini in February 2024

We are the AI Problem examines what the authors term “the Black Nazi Problem” – the phenomenon where AI image generators produce historically incongruous or offensive results, such as depicting Black Nazi soldiers or female popes, as a result of overzealous attempts to correct for bias in training data.
Jacobi and Sag demonstrate that these problematic outputs stem from four interconnected issues: historical biases embedded in training data that reflect centuries of white male dominance, ongoing structural inequalities in contemporary society, the inherent difficulty of balancing historical accuracy with aspirational diversity, and the tendency of AI systems to gravitate toward statistical means that may not align with evolving social values. Rather than viewing these as purely technical problems, Jacobi and Sag argue that AI outputs reveal uncomfortable truths about societal biases and the challenge of operationalizing competing values like inclusion and historical fidelity in algorithmic systems.
Why read this article?
We are the AI Problem is a very short paper by law review standards, but it provides valuable context on how generative AI models are trained and why they tend to reproduce societal biases, offering insightful explanations of concepts like Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI. Jacobi and Sag give readers a comprehensive understanding of the socio-technical challenges in AI development, particularly around content moderation and bias mitigation. Additionally, the article offers useful insights into the broader cultural and political debates surrounding “woke AI” and the practical difficulties of implementing fairness in machine learning systems.
Further Reading
Emily M. Bender et al., On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, FACC’T ’21: PROC. OF THE 2021 ACM CONF. ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY 610 (Mar. 3, 2021) – This influential paper critiques the practice of training large language models on massive web datasets, arguing that treating such data as “representative” of all humanity amplifies historical power imbalances. The authors also raise fundamental questions about the environmental and social costs of ever-larger AI models.
Sorelle A. Friedler, Carlos Scheidegger, & Suresh Venkatasubramanian, The (Im)possibility of Fairness: Different Value Systems Require Different Mechanisms For Fair Decision Making, 64 COMMC’NS OF THE ACM 136 (2021) – This article explores the mathematical impossibility of satisfying multiple definitions of fairness simultaneously in algorithmic systems. The authors demonstrate how different conceptions of equity and justice lead to inherently contradictory requirements for AI systems.
Joy Buolamwini & Timnit Gebru, Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, 81 PMLR 1 (2018) – This groundbreaking study revealed significant accuracy disparities in commercial facial recognition systems, showing that darker-skinned women face the highest error rates. The research demonstrated how unrepresentative training datasets lead to biased outcomes.
Amanda Levendowski, How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem, 93 WASH. L. REV. 579 (2018) – This article examines how copyright restrictions on training data availability could exacerbate bias problems in AI systems by limiting access to diverse datasets. Levendowski argues that broader fair use protections for AI training could actually promote more equitable outcomes by enabling access to more representative data.
