Last week, we reported that the European Parliament is pushing forward with the adoption of the Artificial Intelligence Act (AI Act). Among the amendments to the European Commission’s original text are new disclosure obligations that require providers of AI models (such as Generative AI systems) to disclose any copyright materials used for training those models.
As drafted, the disclosure obligations appear far-reaching but are not very clear. On one reading, they could make it easier for copyright owners to identify if their works have been unlawfully included in training datasets and request their removal/compensation. If that is the case, this new text could increase the litigation risk for providers of foundation models and Generative AI systems. It could also slow the development of AI tools if the pool of materials accessed by those systems is reduced. In addition, the new provisions may expose Generative AI system providers to new liabilities if insufficient processes regarding the documentation of training datasets are in place – failure to comply with these disclosure obligations could lead to potential fines of up to €10 million or 2% annual turnover, whichever is higher.
In this post, we analyse the copyright provisions of the new text and review recent developments on AI and copyright in the UK.
Background and analysis of the consolidated draft AI Act
The question of whether the use of datasets infringes copyright laws has recently come under the spotlight (see news reports here and here). Under EU copyright law, the use of copyright works for the purposes of text and data mining (TDM) for commercial purposes is permitted, provided those works are lawfully accessible and the copyright owner has not expressly reserved their rights (by machine readable means or otherwise).
This text and data mining exception is fairly recent as it was introduced by the Directive on Copyright and related rights in the Digital Single Market (DSM Directive) in 2019 and member states had two years to implement it. As such, it’s not entirely surprising that copyright didn’t feature in the Commission’s 2021 draft of the AI Act.
However, in light of the rapid developments underpinning the world of AI, some have criticised the AI Act for failing to adequately protect the (human) creative industry. The creative industry has also highlighted shortcomings relating to the existing text and data mining exception in the DSM Directive, specifically: (i) the lack of clarity regarding opt-out processes; (ii) the enforceability of opt-outs; (iii) the possibility to circumvent opt-outs by obtaining materials from alternative databases; and (iv) the absence of fair remuneration. Similar difficulties were reported in the Commission’s independent 2022 study on ‘Copyright and New Technologies’.
Revisions to the Text and Data Mining exceptions in the DSM Directive seem unlikely for now, as evidenced by the European Commission’s answer to a question on the interplay between AI and copyright: “… the Commission is not planning to revise this directive [DSM Directive, ndlr]. Having said that, the Commission will keep following closely the issues raised by the development of AI systems, their impact on the cultural and creative sectors and the interplay with the legal framework”.
Consolidated draft AI Act
Against this backdrop, the new (consolidated) draft AI Act imposes disclosure obligations on the providers of foundation models and Generative AI system providers.
Article 28b of the consolidated draft AI Act provides that:
4. Providers of foundation models used in AI systems specifically intended to generate, with varying levels of autonomy, content such as complex text, images, audio, or video (“generative AI”) and providers who specialise a foundation model into a generative AI system, shall
(c) without prejudice to national or Union legislation on copyright, document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law.
The obligation to disclose information relating to training datasets is repeated in other places throughout the Act.
As we noted in last week’s post, under the new text, fines for Foundation Model providers breaching the AI rules are hefty: they may amount up to €10 million or 2% annual turnover, whichever is higher.
The reference to “training data protected under copyright law” is more complex than it seems: there is no copyright register in the EU (unlike in the US), and member state copyright laws differ, so whether content is protected is a matter for legal analysis and, in some cases, litigation. In addition, it is hard to know what constitutes a “sufficiently detailed summary of the use of training data” and how often that summary needs to be updated. This uncertainty could lead to both under- and over-inclusion. Since the purpose of this disclosure obligation is to enable rightsholders to take action against unlawful use, an unclear disclosure scope could lead to an increased risk of unjustified claims and a lack of transparency for rightsholders. Finally, it remains to be seen how such a broad disclosure obligation can be balanced with the legitimate interest of AI providers to refrain from compromising their own IP or trade secrets by making the respective information available to the public.
Aside from this, the new provisions in the AI Act also contain a somewhat ambiguous compliance obligation in Art 28b para 4(b), under which providers shall “train, and where applicable, design and develop the foundation model in such a way as to ensure “adequate safeguards” against the generation of content in breach of Union law in line with the generally acknowledged state of the art, and without prejudice to fundamental rights, including the freedom of expression.” It is not clear what standard of diligence “adequate safeguards” entails, especially in relation to potential breaches of copyright laws.
More guidance on these aspects in the final agreement on the AI Act would be welcome, particularly as the fines associated with lack of compliance are potentially significant. For now, AI providers may well want to prepare for how they could approach these disclosure obligations, e.g. by tracking and documenting training data.
Position in the UK
The EU is not alone in struggling with how to provide AI innovation while balancing the interests of rightsholders. The UK’s intellectual property office (IPO) had planned in 2022 the introduction of a “new copyright and database exception which allows TDM for any purpose”, subject to safeguards such as lawful access. That exception went further than the EU TDM exception: rightsholders would no longer be able to charge for UK licences for TDM and would not be able to contract or opt-out of the exception. The change was intended to support the Government’s National AI Strategy.
In response to criticism from the creative industry, the introduction of the new TDM seems to have been shelved. In its place, the Government, in response to Sir Patrick Vallance’s Pro-Innovation Regulation of Technologies Review, announced that the UK IPO would produce by the summer a code of practice providing “guidance to support AI firms to access copyrighted work as an input to their models”. The Government further specified that “an AI firm which commits to the code of practice can expect to be able to have a reasonable licence offered by a rights holder in return”. Much like the revised version of Europe’s AI Act, clarity will be needed as to what constitutes a reasonable licence. Perhaps anticipating practical difficulties with this FRAND-like licensing regime, the Government has warned that it “may be followed up with legislation, if the code of practice is not adopted or agreement is not reached”.