Zuckerberg was reportedly aware that Meta trained its AI model on pirated works

The core revelation: Meta CEO Mark Zuckerberg approved the use of Library Genesis (LibGen), a known pirated content repository, to train the company’s Llama 3 AI model, according to newly unsealed court documents.

Key details of the disclosure: Internal communications revealed through a class-action lawsuit show Meta executives discussing the company’s deliberate use of unauthorized copyrighted material.

Sony Theakanath, Meta’s director of product management, confirmed in an email that Zuckerberg approved LibGen’s use for AI training
The company explicitly planned to keep its use of LibGen confidential
Meta employees discussed methods to remove copyright indicators from the pirated content
Internal discussions revealed concerns about downloading pirated content from corporate devices

Legal context: A class-action lawsuit filed by authors Christopher Golden, Richard Kadrey, and comedian Sarah Silverman alleges unauthorized use of their copyrighted work.

The documents were unsealed by Judge Vince Chhabria of the U.S. District Court for Northern California
Meta’s legal team had previously argued that their use of text for AI training fell under fair use provisions
Zuckerberg reportedly acknowledged in a deposition that such piracy would raise “lots of red flags”

Corporate strategy and risk assessment: Meta executives weighed the benefits against potential backlash while implementing this controversial decision.

Internal communications cited performance benchmarks as justification for using LibGen
Documents referenced rumors that competitors like OpenAI and Mistral AI were also using the library
Executives acknowledged potential legislative risks, particularly in the US and EU
The company developed specific “mitigations” to address potential fallout

Industry implications: This revelation comes at a critical time for AI development and copyright law.

Meta announced a 5% workforce reduction targeting “lowest performers” (approximately 3,600 workers)
The case could set important precedents for numerous other AI-related copyright lawsuits
The controversy highlights the tension between rapid AI development and intellectual property rights

Analyzing the deeper impact: This controversy exposes a fundamental contradiction in the AI industry’s approach to training data – while companies need vast amounts of high-quality content to develop effective AI models, their methods of obtaining this content often conflict with established intellectual property rights, potentially setting up a long-term conflict between content creators and AI developers.

Zuckerberg was reportedly aware that Meta trained its AI model on pirated works

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development