×
Zuckerberg was reportedly aware that Meta trained its AI model on pirated works
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The core revelation: Meta CEO Mark Zuckerberg approved the use of Library Genesis (LibGen), a known pirated content repository, to train the company’s Llama 3 AI model, according to newly unsealed court documents.

Key details of the disclosure: Internal communications revealed through a class-action lawsuit show Meta executives discussing the company’s deliberate use of unauthorized copyrighted material.

  • Sony Theakanath, Meta’s director of product management, confirmed in an email that Zuckerberg approved LibGen’s use for AI training
  • The company explicitly planned to keep its use of LibGen confidential
  • Meta employees discussed methods to remove copyright indicators from the pirated content
  • Internal discussions revealed concerns about downloading pirated content from corporate devices

Legal context: A class-action lawsuit filed by authors Christopher Golden, Richard Kadrey, and comedian Sarah Silverman alleges unauthorized use of their copyrighted work.

  • The documents were unsealed by Judge Vince Chhabria of the U.S. District Court for Northern California
  • Meta’s legal team had previously argued that their use of text for AI training fell under fair use provisions
  • Zuckerberg reportedly acknowledged in a deposition that such piracy would raise “lots of red flags”

Corporate strategy and risk assessment: Meta executives weighed the benefits against potential backlash while implementing this controversial decision.

  • Internal communications cited performance benchmarks as justification for using LibGen
  • Documents referenced rumors that competitors like OpenAI and Mistral AI were also using the library
  • Executives acknowledged potential legislative risks, particularly in the US and EU
  • The company developed specific “mitigations” to address potential fallout

Industry implications: This revelation comes at a critical time for AI development and copyright law.

  • Meta announced a 5% workforce reduction targeting “lowest performers” (approximately 3,600 workers)
  • The case could set important precedents for numerous other AI-related copyright lawsuits
  • The controversy highlights the tension between rapid AI development and intellectual property rights

Analyzing the deeper impact: This controversy exposes a fundamental contradiction in the AI industry’s approach to training data – while companies need vast amounts of high-quality content to develop effective AI models, their methods of obtaining this content often conflict with established intellectual property rights, potentially setting up a long-term conflict between content creators and AI developers.

Zuckerberg Appeared to Know Meta Trained AI on Pirated Library

Recent News

UI challenges Lightcone could address to improve user experience

Addressing key interface bottlenecks could help bridge the growing gap between AI capabilities and effective human usability in the coming years.

Strategies for human-friendly superintelligence as AI hiveminds evolve

Networks of interacting AI models could create emergent superintelligent capabilities that require new approaches to ensure human values remain central to their development.

AI metrics that matter: Developing effective evaluation systems

Effective AI evaluation requires both technical performance metrics and customer value indicators to prevent misaligned goals and drive informed product decisions.