Fair Use in the Age of AI: When Training Isn’t Copying, and Licensing Isn’t the Law

Because if copyright becomes a system where every transformative use must be paid for, regardless of whether it harms or replaces the original, then fair use disappears not with a bang, but with a paywall.

In the rapidly evolving legal landscape around artificial intelligence and copyright, two district court opinions, Bartz v. Anthropic and Kadrey v. MetaPlatforms, now serve as early landmarks. Both ask the same question: can AI companies use copyrighted books to train large language models without permission from authors or publishers?

One judge said yes. Another said probably not, but still ruled for the AI company due to a weak evidentiary record.

Both decisions arise from the Northern District of California, both apply the same four-factor fair use test from Section 107 of the Copyright Act, and both tackle similar factual contexts. But they come to conflicting conclusions about the legality, and legitimacy, of training AI systems on copyrighted works.

Behind that divergence is a deeper tension about the purpose of copyright law itself: is it meant to incentivize creativity by protecting markets, or has it evolved into a tool for extracting payment from any party wealthy enough to use language at scale?

Before we get to the doctrinal rift, though, it’s worth underscoring one of the most consistent features across these early AI cases: plaintiffs have yet to show that AI models are reproducing anything truly close to a copyrighted work.

The First Missing Piece: No Copy, No Harm

In Bartz v. Anthropic, a group of authors alleged that the AI company Anthropic infringed their copyrights by using entire books to train its large language model, Claude. In Kadrey v. Meta, authors made nearly identical claims against Meta’s LLaMA models, arguing that ingesting full texts without a license violated their rights. Both lawsuits arose in the Northern District of California, involved expressive literary works, and challenged the same core practice

In both Bartz and Kadrey, the courts observed that the plaintiffs had failed to show any infringing output. There were no verbatim re-creations of books, no summary knockoffs, and no substantial similarities between what came out of the models and the works plaintiffs had written.

In Bartz, Judge William Alsup emphasized that Claude’s outputs were not copies or even close approximations. “Claude created no exact copy, nor any substantial knock-off,” he wrote. “Nothing traceable to Authors’ works.” Because Claude filtered its outputs and did not reproduce identifiable material, there was no public-facing infringement to assess.

Judge Vince Chhabria, in Kadrey, came to a similar factual finding. Though his ruling is more skeptical toward AI developers, he conceded that Meta’s models were not reproducing protected text. “Llama is not capable of generating enough text from the plaintiffs’ books to matter,” he wrote. Even using adversarial prompting techniques, the most plaintiffs could extract was 50 words, and not consistently.

Chhabria noted, pointedly, that “the plaintiffs barely give this issue lip service, and they present no evidence about how the current or expected outputs from Meta’s models would dilute the market for their own works.”

That evidentiary vacuum shaped both judges’ application of the fair use doctrine. And it remains a critical threshold in any copyright suit involving generative AI.

The Fair Use Analysis: Four Factors, Two Interpretations

Fair use is assessed through a four-factor test codified in Section 107 of the Copyright Act. While Bartz and Kadrey apply the same framework, their interpretations reveal two distinct visions of what fair use should protect—and what it should prevent.

1. Purpose and Character of the Use

This factor has become the gravitational center of modern fair use jurisprudence since the Supreme Court’s decision in Campbell v. Acuff-Rose. (If you want to see a comic book I wrote about Campbell, have a look here)

A use is favored when it is “transformative”- that is, when it adds something new, with a different purpose or character, rather than merely repackaging the original.

Judge Alsup applied this principle in Bartz with confidence. Drawing analogies to human learning, he held that Anthropic’s LLMs ingested books the way students do, to learn from them, not to recite them:

“Like any reader aspiring to be a writer,” Alsup explained, “Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them—but to turn a hard corner and create something different.”

He cited Google Books and HathiTrust, both Second Circuit decisions upholding the digitization of books for non-consumptive purposes like search indexing and accessibility, and AV v. iParadigms, where storing student papers in a plagiarism checker database was upheld as fair use because it served a distinct, socially valuable function.

Judge Chhabria agreed that training LLMs is transformative in the Campbell sense, and accepted that Meta’s models were designed for “a wide range of functions” far afield from the purpose of reading novels or memoirs. But he took a more cautious tone, warning that courts should not allow the first factor to swallow the fair use inquiry. Transformativeness, he emphasized, does not always guarantee legality. “There is certainly no rule that when your use of a protected work is ‘transformative,’ this automatically inoculates you.”

Despite this cautious analysis, both judges did find both AI uses transformative. The difference lay in how much weight they gave that transformation in the overall analysis.

2. Nature of the Copyrighted Work

This factor was straightforward in both opinions. The plaintiffs’ books—fiction, memoir, nonfiction—are highly expressive works. Chhabria rightly noted that even factual works receive copyright protection for the “manner of expressing” those facts, citing Google Books and Harper & Row. Alsup, too, recognized the expressive nature of the books but deemed it outweighed by other factors, particularly the highly transformative use and lack of market substitution.

Neither judge treated this factor as dispositive, in keeping with longstanding precedent that gives this element the least weight.

3. Amount and Substantiality of the Portion Used

Both Meta and Anthropic copied the plaintiffs’ books in full. On paper, that would normally weigh against fair use. But both judges found this to be justifiable given the purpose.

Alsup found full copying necessary for the training process, invoking Google Books, Sega v. Accolade, and Sony v. Connectix; cases where entire works (software code, books) were copied for the functional purpose of learning, analysis, or transformation. What mattered was that the copied material was not used to create competing expressive works, but instead enabled new functionalities.

Chhabria agreed, though in a more cautious in tone. He acknowledged that “feeding a whole book to an LLM does more to train it than would feeding it only half.” Because LLMs require large volumes of coherent, high-quality text to develop generalizable language models, the use of full books was found to be reasonable.

Both courts accepted that copying an entire work may still qualify as fair use when the resulting application is non-substitutive and transformative. It’s a line that is drawn clearly from precedents in Campbell, Oracle v. Google, and HathiTrust.

4. Effect of the Use on the Market

“Campbell v. Acuff-Rose Music” comic by Jackie Roche and Kyle K. Courtney at 
https://www.jrocheworkshop.com/3018548-2-live-crew-fair-use

This is where the two decisions finally break.

Judge Alsup took the traditional approach: to violate fair use, a use must threaten to substitute for the original. “Claude created no exact copy, nor any substantial knock-off,” he repeated. He cited Perfect 10 v. Amazon, where thumbnail images were not infringing because they didn’t substitute for full-resolution photographs, and Google Books, where the scanning and indexing of millions of books was upheld in part because users could not read the books online. Alsup concluded that the authors’ markets remained intact because no competing product (no output) was displacing the books in question.

Chhabria, however, viewed the potential for indirect substitution as a serious threat. He imagined a world in which LLMs could produce books in the same genre or style as the plaintiffs’ works, saturating the market with alternatives that, while not infringing, could crowd out demand. “The market for the typical human-created romance or spy novel could be diminished substantially,” he wrote. Chhabria went so far as to say that the availability of other romance and spy novels “diminishes the incentive” for future authors to write such books, a harm he believes the Copyright Act aims to prevent.

What Chhabria fails to engage with fully is that speculative, unproven dilution is not, on its own, a sufficient basis for denying fair use. If it were, nearly every transformative use could be barred on the chance that it might influence markets over time. Campbell warned against equating “any adverse effect” with market substitution. Oracle required proof that a use actually diminished the value of the original work.

Chhabria rightly rejected two weaker theories of market harm: that plaintiffs had lost a chance to license their works for AI training (a non-cognizable harm under Bill Graham Archives and Tresóna), and that the models reproduced protected content (which they didn’t). But by elevating the speculative risk of future market dilution, absent any concrete evidence, he transformed a doctrinal concern into a hypothetical one.

This approach misreads Campbell, which warned explicitly that not all market harm is actionable. As the Court explained in Campbell, even a “lethal parody” might kill demand for the original, but that kind of loss is not “cognizable under the Copyright Act” because it arises from lawful commentary or transformation, not substitution.

Chhabria’s opinion also overlooks the logic of Oracle, where the Court made clear that the analysis must account not only for potential losses, but also for the source of the loss and the public benefits the use produces. As Justice Breyer wrote in Oracle, courts must ask whether the use contributes to the creative production of new expression and balance that against any lost revenue, particularly when that loss is speculative or grounded in licensing expectations the law does not guarantee.

Chhabria ultimately ruled for Meta due to the plaintiffs’ evidentiary failures. But his dicta invites future plaintiffs to resurrect the licensing harm theory under a different label, framed as “dilution,” but still rooted in the idea that any unlicensed use is presumptively unfair. That framing, if adopted by other courts, would flip fair use on its head, transforming it from a constitutional safety valve into a default permission regime.

This is precisely what fair use doctrine was designed to prevent. As the Supreme Court warned in Campbell, courts must not allow rightsholders to weaponize potential markets to “strike the balance” away from free expression and technological progress.

Fair Use Is Not a Permission Regime

These cases underscore a fundamental truth: fair use is not a loophole. It is a vital safeguard. It protects the right to use copyrighted material in ways that do not compete with or substitute for the original. It ensures that learning, criticism, research, and technological innovation are not locked behind a perpetual paywall.

Judge Alsup understood this. His opinion was bold but doctrinally rooted. He saw that licensing is not the answer to every novel use of copyrighted works, especially those that are internal, transformative, and do not result in redistribution.

Judge Chhabria’s opinion, though more cautious, ultimately acknowledged that the plaintiffs had failed to make the right argument. But in elevating this “hypothetical dilution” into a near-fatal concern, his reasoning risks tipping fair use doctrine toward a kind of de facto licensing regime; one that is not grounded in market harm, but in the fear of market disruption.

And that’s a critical difference. Because if copyright becomes a system where every transformative use must be paid for, regardless of whether it harms or replaces the original, then fair use disappears not with a bang, but with a paywall.

Libraries, Fair Use, and the Case for Ethically Trained AI

If fair use can potentially shield commercial AI firms like Anthropic and Meta for training models on massive amounts of copyrighted works, the case is even stronger, and far more compelling, for libraries doing the same work in service of research, education, and public access.

Let’s return to first principles. Section 107 of the Copyright Act doesn’t just permit fair use; it actively encourages it for “purposes such as criticism, comment, news reporting, teaching, scholarship, or research.” Those aren’t illustrative suggestions, they’re the statutory heart of the doctrine. And unlike corporate AI developers, libraries exist precisely to advance these ends.

Imagine a university or public library system with a robust collection of legally acquired books, print and digital, used to train a large language model for scholarly research, classroom instruction, or public inquiry. This model wouldn’t be sold, wouldn’t generate profit, and wouldn’t be exposed to the same market forces driving Anthropic or Meta. It would be a tool, like a search index, a digital card catalog, or a recommendation system, built on the same foundation libraries have always upheld: preserving knowledge, expanding access, and enabling inquiry.

The case law already provides strong footing. In HathiTrust, the Second Circuit held that full-text scanning of millions of library books for purposes of search, indexing, and accessibility was a lawful fair use. The court emphasized that the use was “non-expressive,” “transformative,” and most importantly, that it did not harm the market for the original works. In Google Books, the court reaffirmed that digitization and full-text indexing for scholarly and informational purposes, even when performed by a commercial actor in partnership with libraries, was protected.

Now transpose that logic to the training of AI. A library-based model trained solely on its own legally acquired collections, without public redistribution of any copyrighted content, used only for research, internal education, or discovery, is not merely a fair use; it is a model fair use.

Judge Alsup’s opinion in Bartz is especially instructive here. He repeatedly analogized LLM training to human learning, emphasizing that reading, memorizing, and rearticulating ideas is at the core of how writing and thinking function. “Everyone reads texts, too,” he wrote, “then writes new texts.” A library LLM would do exactly that: it reads, learns, digests. It doesn’t publish, it doesn’t compete, and it doesn’t sell.

Judge Chhabria, in Kadrey, was rightly concerned with commercial substitution. But the good news is that libraries are not pirates. They are stewards. The very harms that animated Chhabria’s caution (loss of revenue, market flooding, algorithmic displacement) don’t apply in the nonprofit, mission-driven context of libraries. There is no loss of incentive to create when a book is used for educational AI research in a library basement.

Moreover, the commercial vs. nonprofit distinction is baked into the statute itself. The first fair use factor expressly directs courts to consider “whether such use is of a commercial nature or is for nonprofit educational purposes.” That language has teeth. Courts have long treated nonprofit library use as favorably situated for fair use purposes (see Sony v. Universal, Campbell, and Authors Guild v. HathiTrust.). A library’s model, developed not to compete in the market but to deepen it, fits squarely within that protected zone.

And let’s remember the constitutional dimension. The goal of copyright, per Article I, Section 8, is “to promote the progress of science and useful arts.” A nonprofit LLM trained on library holdings is not a workaround to copyright, it is a realization of that very goal. It promotes progress, enhances learning, and democratizes access to information in new ways. That’s not a threat. That’s the point.

So as we watch commercial AI models scramble to license books, hedge market risk, or parse which licenses they might retroactively need, let’s not lose sight of what fair use protects. It protects readers. It protects researchers. It protects libraries. And it protects the right to use knowledge not to profit from it, but to advance it.

Libraries, in this context, are not just fair users. They continue to be model citizens of fair use.

And perhaps, in the world we’re building, librarires might become the most ethical and legally sound trainers of artificial intelligence.

Editor’s Note: This article is republished with permission of the author with first publication on his new Substack – Copyright Fight Club

Facebook LinkedIn

Posted in: AI, Copyright, Courts & Technology, Legal Research, Libraries & Librarians, Search Engines, Social Media, Technology Trends, United States Law