Deepseek, Meta and OpenAI
Copyright issues surrounding AI training data remain as complex as ever, recent developments in the AI space concerning Deepseek, OpenAI and Meta have reignited concerns over how generative AI models are trained and whether their methodologies infringe upon existing copyright protections.
OpenAI currently faces multiple legal challenges regarding its alleged use of copyrighted materials to train models like ChatGPT. Similarly, Deepseek has come under scrutiny for its approach to data acquisition and usage—ironically, by OpenAI itself.
Most recently, Meta (previously Facebook) has drawn attention by allegedly purposely stripping pirated libraries and other sources to train its models without proper attribution or licensing or any regard to copyright laws.
This scenario raises significant legal and ethical concerns and from a legal standpoint continues to dirty the waters surrounding liability and enforcement surrounding the use and application of AI training data, making it difficult to identified utilised works and additionally, complicating the enforcement of copyright protections. The complexity surrounding training data exacerbates these concerns, as companies often claim that their datasets are proprietary or obtained through ambiguous means. This lack of transparency not only hinders copyright holders from asserting their rights but also raises broader questions about fair compensation for original creators.
The Lack of Transparency in AI Training Data
One of the most pressing challenges is the lack of transparency in how AI models acquire and utilise data, making it difficult for rights holders to assess infringements or assert their claims. The EU AI Act offers a proposed solution, which imposes specific requirements on training data, particularly for high-risk AI models. Under the Act, AI developers must ensure that datasets used for training are legally obtained, traceable, and free from bias, with clear documentation on their sources.
This, however, is not the universal approach. The United States has notably taken a different stance, favouring lighter regulations to encourage AI innovation and competitiveness. Recent comments by U.S. Vice President JD Vance, warning Europe of "excessive" regulation in the AI industry, reinforce this divide. Vance cautioned that stringent regulatory frameworks like the EU AI Act could stifle technological progress, deter investment, and place European AI developers at a disadvantage compared to their U.S. and Chinese counterparts.
Given this perspective, it is highly unlikely that similar provisions, such as strict training data transparency or punitive fines—will be incorporated into U.S. AI policy in the near future.
The Need for Legal and Policy Direction
The growing body of litigation surrounding AI training data signals an urgent need for judicial clarity and legislative reform. Should these approaches continue to diverge, the landscape involving the publication and distribution of creative works in the future will change drastically.
This may proceed to take the form of increasingly complex Technology Prevention Measures (TPM's) being implemented increasing the cost and inaccessibility of works, or simply a greater adoption of closed access to information policies, further decreasing the accessibility of currently freely available information.
The continuing differences of opinion in approaches raises important questions for the current manner in which works are published and distributed. Will it develop into a closed, tightly controlled ecosystem where content is locked behind proprietary barriers, or will it embrace a more open, yet legally accountable, model that balances innovation with copyright protection.
The decisions made by relevant role-players in the coming years will have drastic consequences on the accessibility, ownership, and control of digital works, ultimately shaping the future of creative expression and knowledge dissemination.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.