A series of closed-door forums hosted by Senate Majority Leader Chuck Schumer has been examining various discreet topics on Artificial Intelligence. One session, held on November 30, focused on transparency, explainability, intellectual property, and copyright issues surrounding AI. While the full session was not public, the opening and closing remarks as well as the statements of each participant have been posted.

Below are extracts of the session participants' comments relating to copyright issues:

Senator Schumer: Generative AI is already capable of producing new images, texts, and audio in response to a user's prompts. However, these programs are trained to generate these outputs by being exposed to large quantities of data, including copyright-protected material, like existing songs, writings, photos, paintings, and other artwork. This creates a copyright issue on two fronts: the input, or using copyrighted works to train these systems, and the output, or generating work that is in a gray area in terms of legal usage and ownership. So, I believe there is a role for Congress to play to promote transparency in AI systems and protect the rights of creators. At the same time, I believe there is a role to play for Congress to enforce strong intellectual property standards for AI internationally. All while protecting the need to continue innovating around AI, our north star in this endeavor. Today's AI systems represent some of the most important U.S. intellectual property of our lifetime. Forcing companies to reveal unnecessary amounts of their IP is harmful, it could stifle innovation, and it would empower our adversaries to use them for ill. But while strong IP standards can protect innovation in AI, the true potential of this technology can only be reached if it is understandable to its users. That's where explainability comes in.

Stability AI:

  • It is more important than ever that we can scrutinize these systems before the next wave of digital services and digital ventures are built on "black box" technology operated by a small cluster of Big Tech firms . . . Open models promote transparency. Researchers and authorities can "look under the hood" of an open model to verify performance, identify risks or vulnerabilities, study interpretability techniques, develop new mitigations, and correct for bias. By comparison, closed models may not disclose how they are developed or how they operate. Closed models may be comparatively opaque, and risk management may depend on trust in the developer.
  • We believe that AI development is an acceptable, transformative, and socially-beneficial use of existing content that is protected by fair use and furthers the objectives of copyright law, including to "promote the progress of science and useful arts". Through training, generative AI models learn the unprotectable ideas, facts, and structures within a visual or language system, and that process does not interfere with the use and enjoyment of the original works. Free learning of these facts about our world is essential to recent developments in AI, and it is doubtful that these groundbreaking technologies would be possible without it. The United States has established global leadership in AI due, in part, to a robust, adaptable, and principles-based fair use doctrine that balances creative rights with open innovation.
  • We believe that existing legal frameworks effectively govern AI outputs, ranging from the replication of a specific work, to the use of protected likeness, to permissible experimentation with style. Likewise, existing frameworks can resolve questions of authorship. In principle, we acknowledge a threshold of authorship below which an AI-generated work with negligible human input may not qualify for registration. That said, we are concerned that recent U.S. Copyright Office (USCO) guidance and decisions may not account for the many ways in which human input can rise above that threshold. A user with clear expressive intent, who has demonstrated that they directed the AI system, should be able to register their work. We welcome further clarification on this issue. Overly discretionary guidance means that creators may be unfairly disadvantaged by their use of AI tools within a wider creative workflow.
  • We recognize the concern among some creators about the development and deployment of these systems. We are actively working to address these concerns through technology, standards, and good practices. These efforts, including opt-outs, labeling, training, and data access, are detailed in our recent submission to the USCO.

News Media Alliance:

  • GAI developers have developed large language model (LLM) systems by copying massive amounts of the creative content of media publishers without consent, credit, or compensation. These systems do not actually "learn" facts; rather, they produce sequences of words that mimic human speech. And they can be deployed for harmful uses in a variety of ways: they may make up answers, paraphrase, or alter photographs to deceive . . . The pervasive, unauthorized use of publisher content to produce outputs that include inaccuracies and other harmful attributes then devalues publisher brands by muddying the source of the original content and misattributing information or false information to unrelated publishers or journalists.
  • This rampant copying infringes on the exclusive rights protected by copyright and far exceeds the bounds of fair use. Two key points separate LLM technology from other copying that has been found to be a fair use.
    • First, LLMs ingest valuable media content to copy "expression for expression's sake," targeting the very aspects protected by copyright. To the extent they are ingesting content so that published words can be analyzed "in relation to all the other words in a sentence," or their sequences of words identified, that analysis and identification is intended to copy and retain the very expression that copyright protects. It is inaccurate and dangerous to anthropomorphize GAI models as "learning" unprotectable facts—these technologies are not "learning" as humans do, but memorizing to regurgitate and mimic copyright-protected expression without ever absorbing any underlying concepts.
    • Second, the outputs of GAI models directly compete with the protected content that was copied and used to train them. The rise of chatbots that provide detailed narrative answers to prompts goes far beyond prior judicial holdings that the carefully articulated purpose of helping users navigate to original sources could justify the wholesale copying of online content. In fact, leading developers boast that users no longer need to access or review original sources. Worse yet, an increasing number of GAI products are designed to fetch fresh news stories to "ground" generative AI output and better summarize publisher content, through a process known as "retrieval augmented generation" or "RAG". Both archived content and breaking news will be combined to compete with the delivery of news in every form. This will become unsustainable if the source of original content can no longer be funded, and the models have nothing of quality left to feed on.

Allen Institute for AI:

  • [W]e share the widely held belief that use of copyrighted training materials in training datasets constitutes, at minimum, fair use. We believe an infringement and fair use analysis should focus only on the outputs of an AI system rather than the training material of the model. The value of training data is not in its creative expression, but rather in its volume and diversity. The most capable AI models require billions of items of material to be effectively trained. Such large scale datasets can only be assembled through large-scale data collection efforts, such as scraping of web content and digitalization of media. After training material is acquired, several steps are required in order to make it suitable for training. Because of the numerous filtering and transformation steps, training datasets do not represent a substitute for human consumption in a competitive way.
  • We acknowledge the possibility that training on copyrighted data may result in infringing output, and here are some observations: Open models provide the foundation to study relationships between input and output in a scientific way; however the law as it currently stands disincentivizes openness and transparency because there are no safe harbor or negligence standards, and opening up the AI development lifecycle can create the appearance of a violation (e.g., someone may see their content has been inadvertently used as training data).

International Alliance of Theatrical Stage Employees (IATSE):  In our view, the most pressing area of focus for policymaking and regulatory action by the United States Congress is maintaining strong copyright and intellectual property laws. AI is the next frontier of largescale online piracy. While IATSE members do not own the copyrights to the works we help create, our livelihoods depend on collectively bargained contractual residuals paid to our health and pension plans when the copyrights for those audiovisual works are licensed to others over the life of a work. Congress must ensure entertainment workers are fairly compensated when their work is used to train, develop or generate new works by AI systems. AI developers cannot be allowed to circumvent established U.S. copyright law and commit intellectual property theft by scraping the internet for copyrighted works to train their models without permission from rightsholders. The theft of copyrighted works – domestically and internationally – threatens our hard-won health care benefits and retirement security.

Sony:

  • Based on recent Copyright Office filings it is clear that the technology industry and speculative financial investors would like governments to believe in a very distorted view of copyright. One in which music is considered fair use for training purposes and in which certain companies are permitted to appropriate the entire value produced by the creative sector without permission, and to build huge businesses based on it without paying anything to the creators concerned. In that view, Artist name and likeness rights should be extremely limited. And consumers should be held personally liable for content generated by the AI platforms, absolving the platforms of any liability or obligation to police for bad behavior. They also put forward that record keeping and transparency within generative AI technologies is impossible or too burdensome. They suggest that the enforcement of copyright stifles innovation. It is ironic and important to note that, most likely, the technology industry itself will most influence the future competitive environment in AI, including whether new start-ups will be able to effectively compete. 
  • Even worse are those that argue that copyrighted content should automatically be considered fair use so that protected works are never compensated for usage and creators have no say in the products or business models that are developed around them and their work. Congress should assure and agencies should presume that reproducing music to train AI models, in itself, is not a fair use.

Motion Picture Association:

  • The debate about whether reproduction of copyrighted works to "train" AI models constitutes copyright infringement, or is permitted by the fair use defense, has become highly polarized, with many participants staking out "all or nothing" positions on this issue. But sweeping generalizations that training is always, or is never, fair use, are not appropriate. As the Supreme Court instructed in Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 577 (1994), "The task [of determining whether a use is fair] is not to be simplified with bright-line rules, for the statute, like the doctrine it recognizes, calls for case-by-case analysis."
  • More than a dozen lawsuits raising the issue of AI training/fair use have been filed over the past year, and we expect courts to begin issuing substantive rulings in 2024. As mandated by Section 107 of the Copyright Act and the case law interpreting it, courts will apply the four fair use factors to the facts before them and reach decisions in each case. They will consider factors including whether the AI company is engaged in non-commercial or for-profit activities, 17 U.S.C. § 107(1), and whether the particular use of the plaintiffs' works harms "the potential market for or value of" those works, id. § 107(4). If courts reach different conclusions in these cases based on the different facts before them, that is an inherent feature of fair use, which is "an equitable rule of reason," under which "each case raising the question must be decided on its own facts." Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. 539, 560 (1985) (quoting H. R. Rep. No. 94-1476, at 65 (1976)). As of now, there is no cause to believe the courts and existing law are not up to the task of applying existing copyright law to new technology—as courts have been doing for over a century—and thus MPA sees no reason for Congress to preemptively intervene by amending the Copyright Act to resolve these fair use issues.

Sign up here to ensure you do not miss Steptoe's next update about the fast-changing AI legal landscape.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.