Two Authors Accuse Apple of Illegally Training AI Models with Pirated Books

Apple is now facing legal action from authors who allege that the tech giant used their copyrighted books, without consent or compensation, to train its artificial intelligence systems.

Kylo B

9/7/20252 min read

Apple Center
Apple Center

Two Authors Accuse Apple of Illegally Training AI Models with Pirated Books

Apple is now facing legal action from authors who allege that the tech giant used their copyrighted books, without consent or compensation, to train its artificial intelligence systems.

The Lawsuit Unveiled

Authors Grady Hendrix and Jennifer Roberson filed a proposed class-action lawsuit in the U.S. District Court for Northern California, claiming that Apple illegally used their works in training its OpenELM large language model. The lawsuit asserts that Apple sourced the content from a known collection of pirated e-books, including the dataset Books3, which originated from “shadow library” sites like Bibliotik and was linked to the RedPajama data collection technique. Meanwhile, the authors contend that Apple did not credit or compensate them despite using their works in what could become a highly profitable venture.ReutersiThinkDifferentAppleInsider

What the Plaintiffs Want

The authors are seeking:

  • Statutory and compensatory damages

  • Restitution of the unfair profits Apple derived

  • Attorneys’ fees

  • Possible destruction of AI models (like Apple Intelligence or OpenELM) that were trained using the disputed content
    Additionally, they request a jury trial and class-action certification to include other affected authors.iThinkDifferentAppleInsider

Apples' Claim of Ethical Training vs. Data Origins

Apple has emphasized its commitment to ethical AI development, including licensing agreements with publishers and reliance on its AppleBot crawler that respects robots.txt directives. Yet the lawsuit argues that despite these efforts, Apple still relied on data indirectly sourced from pirated materials, a contradiction that lies at the heart of the legal challenge.AppleInsiderTHE DECODERiThinkDifferent

Part of a Larger Legal Wave

Apple’s case joins a growing number of lawsuits targeting AI companies for training models with copyrighted texts:

These cases illustrate rising tension between AI innovation and the protection of creative works.

The Fair Use Debate

The core legal battleground centers on whether AI training qualifies as fair use, a defense some companies like Apple aim to leverage. However, courts are increasingly skeptical, particularly when pirated or unauthorized content is involved.

In Anthropic’s case, while some usage was deemed transformative, judges found that knowingly including pirated works was not protected under fair use.Financial TimesWikipedia+1The Washington PostVanity Fair

If Apple’s deployment of Books3, and its downstream use via OpenELM, is confirmed, it could similarly undermine their fair-use defense.

Why It Matters

StakeholderImplicationsAuthors & PublishersA potential turning point for compensation and control over AI-used content.AI IndustryPressure to adopt transparent, licensed datasets and rethink training models.Consumers & RegulatorsSets precedents in intellectual property norms that shape future AI ethics.

The lawsuit filed by Hendrix and Roberson against Apple underscores a growing concern: that some AI models are built upon creatively generated works without permission or credit. As the suit progresses, its outcome could define new boundaries around AI training, fair use, and the rights of creators, particularly when pirated content is involved. The implications may reverberate far beyond Apple, influencing how AI technologies evolve and respect intellectual property.