Mitigate Copyright Infringement Liability Exposure for Generative AI Output
In the past year, generative artificial intelligence (“generative AI”), capable of producing material one could describe as “creative” and “expressive”, such as text, images, audio, and video, has revolutionized the way businesses operate and caused a stir in the creative industries. Trained on countless literary works, pieces of software, artworks, audio, video, photographs, and other content, generative AI “learned” to create expressive output based on descriptive prompts from human users.
Training AI models requires analyzing vast amounts of data to learn patterns relevant to the AI system’s purpose. Although there has been no clear guidance from Congress or the courts on this matter yet, using data in generative AI systems almost certainly implicates the legal regimes protecting rights to such data, like copyright, data privacy law, and right to publicity. Although the doctrine of fair use may avail AI developers of infringement liability, as described in our article, it will be a matter for the court to decide in each particular case.
Besides machine learning-specific matters, lawsuits filed in the past year against AI big tech have raised the issue of generative AI potentially implicating copyright law at the output stage if the AI’s product is similar to a piece of training data. Cited examples included GitHub Copilot allegedly reproducing recognizable portions of copyrighted computer code and Stability AI producing an image of soccer players with an uncanny resemblance to a Getty Images photograph.
So, how much similarity is enough to constitute copyright infringement? Who is liable for such infringement: AI companies or end users? How can AI developers mitigate their legal exposure through legal and technical tools? Read on to learn more.
What Constitutes Infringement in General
In U. S. law, copyright infringement takes place if:
- There is a valid copyright in the original work; and
- There was unauthorized copying of the original work (meaning that at least one of the exclusive rights under copyright was violated).
The copying component of the copyright infringement test is proven if (1) there is either evidence of factual copying or (2) there is a “substantial similarity” between the original and the infringing work.
Factual copying could be proven by direct (rarely available) or circumstantial evidence. Circumstantial evidence may include proof of AI’s access to the copyrighted work AND a “probative similarity” beyond independent creation between the original work and the AI output. A claimant in a copyright infringement case could obtain evidence that their copyrighted work was included in the machine training dataset. It may be readily available (there is a website that checks whether a popular text-image pair training dataset, LAION-5B (used by Stable Diffusion) contains an image) or could potentially be procured in a court-ordered discovery. Absent evidence of access to the copyrighted work, a “striking similarity” is enough to prove the copying.
The degree of similarity is a question of fact and is determined by the jury based on the evidence in the case (which may include expert evidence). In assessing the degree of similarity between the works, courts consider whether the similar elements are unique, intricate, or unexpected; whether the two works contain the same errors; and whether it appears that there were blatant attempts to cover up the similarities. The existence of something that closely resembles the particular claimant’s artist signature in an AI output or a company’s watermark could potentially be an example of such evidence. Courts can use other criteria, such as “the total concept and feel,” which combines “objective” extrinsic and “subjective” intrinsic tests. All in all, the examination is factual and case-specific.
But Who is to Blame?
In general, under the doctrine of direct infringement, the actor committing copyright infringement is the one most proximately positioned to the cause of the infringing event.
Secondary infringement occurs when there is a direct infringer, but a second party induces, contributes to, encourages, or profits from the infringement. Secondary infringement is rooted in case law and takes the forms of contributory and vicarious infringement. Contributory infringement occurs when someone knows of the direct infringement and encourages, induces, causes, or materially contributes to it. Vicarious liability arises when someone has the authority and ability to control the direct infringer and directly profits from the infringement.
With most generative AI systems, end users may not make expressive choices but rather provide abstract verbal prompts, leaving the “creative” work to the AI. In such a case, it appears the end user is unlikely to be the direct copyright infringer if the output is infringing.
Usually, the verbal prompts will take forms of ideas not subject to copyright protection (“Create a pop-art portrait of a blond actress”). On the other hand, users may either input requests that contain copyrightable material on which the output will be based (“use a copyrighted painting by Yayoi Kusama I uploaded”) or otherwise intentionally target a copyrighted work (“summarize Martin Luther King Jr.’s “I have a dream” speech). So, when the output work turns out substantially similar to a copyrighted work or otherwise passes the copyrighted infringement threshold, the end user may or may not be causing the infringement and thus be liable.
In the case the end user is directly liable, the AI company may be secondarily liable if it (1) provided a product that is capable of producing infringing work and (2) benefits from the infringing activity (for example, if the service is subscription-based).
Sometimes, however, AI may return infringing outputs even when not intended by the end user. In such a case, the AI company may be the closest actor to being capable of exercising control over the AI system since it conducted the machine learning process and chose/built the datasets. Consequently, the AI company may be the direct infringer.
There are circumstances in which such exculpatory clauses may not be enforceable in court, however. As a public policy consideration, most states will not enforce a clause absolving a party of the results of its own gross negligence or willful misconduct. Consequently, AI companies are advised to implement procedures that mitigate the risks of producing an infringing output. Although generative AI is a black box technology that is not always or fully predictable, developers may look into installing filters that would limit the “weight” of a piece of training data (a single painting, for example) to a certain pre-determined percentage, making it highly unlikely that the system will produce an output that is substantially similar to the piece of training data.
Sometimes, the fair use doctrine may apply to instances of an infringing output. In a dispute, the court will evaluate the fair use factors outlined in our article on machine learning and fair use in each particular case. It is possible that non-commercial use of an output by the end user may constitute fair use.
Generative AI is a powerful tool helping human creators cut down on content-generating costs, save time for more complex work, and get new ideas. If AI creates something closely resembling a copyrighted piece of data it trained on, absent fair use, the copyright holder may have a case against the persons or entities who caused the infringing act.
Certain factors come into play when determining who, at all, is liable for the AI output infringing on a copyrighted work. End users are likely not precluded from being liable if they caused the AI output to resemble a copyrighted work (for example, by prompting the AI). But AI companies may be liable because they are often the best positioned to design systems with specific quality controls that would or would not enable an AI to infringe. Implementing similarity percentage “filters” could be a way to mitigate the copyright infringement risks by AI companies.
The intersections of generative AI and copyright are exciting new domains with a potential for policymaking. In the meantime, one must proceed cautiously and mitigate legal exposure risks. Contact us at Marashlian & Donahue, The CommLaw Group for assistance with AI legal risk management!
The CommLaw Group Can Help!
Whether you need to assess your legal exposure or are considering investments in AI models or startups and require due diligence analysis on associated legal risks, our proficient and versatile team is here to assist. Reach out to us for comprehensive, client-focused solutions!
Jonathan Marashlian – Tel: 703-714-1313 / E-mail: jsm@CommLawGroup.com
Michael Donahue – Tel: 703-714-1319 / E-mail: mpd@CommLawGroup.com
Linda McReynolds – Tel: 703-714-1318 / E-mail: email@example.com
Ronald Quirk– Tel: 703-714-1305 / E-mail: req@CommLawGroup.com
Diana Bikbaeva – Tel: 703-663-6757 / E-mail: firstname.lastname@example.org