In an earlier blog (here) we dissected the copyright framework for text and data mining under the EU Copyright Directive (Directive 2019/790/EU) and its applicability on AI training data. By way of the EU AI Act, these rules might now become relevant for providers of general purpose AI models even where they train such models only outside of the European Union.
Following the principle of territoriality of copyright (copyrights consist of a “bundle” of national copyrights, rather than one uniform “global” right) the admissibility of AI training under copyright law – as far as copyrighted material is used – is traditionally determined by the law of the country in which such training takes place (or: where the relevant copyrighted material is used for such purposes). The AI Act, which in its initial drafts would have had limited impact on copyright compliance (such as, in particular, transparency requirements, see our blog here), seems to change this principle: The text as recently adopted by the European Parliament, contains additional provisions aimed at creating a level playing field regarding AI training data. In essence, they call for the (global) compliance with EU copyright law, in particular for text and data mining regardless of where the AI model in question was trained.
These rules could affect the training of AI models in the future and raise a number of practical and challenging questions for rightsholders and developers of AI systems.
Recap: What is text and data mining and why is it relevant for AI development?
At first glance, text and data mining may sound like a niche topic for tech enthusiasts. Nothing could be farther from the truth: Text and data mining means the automated analysis of individual or several digital or digitised works for the purpose of gathering information, in particular regarding patterns, trends and correlations. Or – in bolder language – mining of information as the “gold of the digital age”.
Text and data mining can be particularly important for AI development as such systems, in particular foundation models used for generative AI applications, require training. Conducting such training requires large quantities of data and content which may include copyrighted materials such as texts and images. We explained here that the admissibility of using copyright protected material for AI training is a highly contested issue globally.
In fact, an increasing number of legal disputes revolving around the use of copyright protected AI input data have arisen since (see here for example). The question if, under which circumstances and to which extent text and data mining is permissible is at the heart of these lawsuits and their outcomes will affect AI developers and rightsholders alike.
The AI Act: A call for global compliance with EU copyright law?
During the discussions of the AI Act, copyright was originally not at the centre of the discussion and there seemed to be limited political appetite to re-open copyright debates not long after the adoption of the hotly contested EU Copyright Directive (Directive (EU) 2019/790). One of the draft provisions in the AI Act which was explicitly copyright-related was a transparency obligation to disclose which data was used for AI training aimed at helping rightsholders to determine if their rights were used (see more on this here).
The AI Act did, however, not contain explicit substantive provisions on copyright and the applicable law. Since copyright law is governed by the principle of territoriality (i.e. there is no uniform, global copyright but a “bundle” of national copyrights) the place where a use/ an infringement takes place would usually determine the applicable law. Though this place is not often easy to determine in the digital sphere (see our blog post here) the principle as such is rather straightforward: If training of an AI model takes place in a specific country the admissibility of such training is determined by the (copyright) laws of this jurisdiction. Insofar as the trained AI model does not further use or replicate training data this might open a possibility for forum shopping: Theoretically, an AI model could be trained and developed in jurisdictions with less restrictions on the use of copyrighted material for training purposes and then rolled out globally (even in jurisdictions with stricter requirements).
Such a scenario was apparently on the EU legislator’s mind during the trilogue discussions of the AI Act. These discussions led to a number of copyright-related additions in the recitals and the main body of the AI Act, apparently aimed at preventing a “race to the bottom” regarding AI training. Recital 106 of the AI Act reads (emphasis added):
Providers that place general-purpose AI models on the Union market should ensure compliance with the relevant obligations in this Regulation. To that end, providers of general-purpose AI models should put in place a policy to comply with Union law on copyright and related rights, in particular to identify and comply with the reservations of rights expressed by rightsholders pursuant to Article 4(3) of Directive (EU) 2019/790. Any provider placing a general-purpose AI model on the Union market should comply with this obligation, regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of those general-purpose AI models take place. This is necessary to ensure a level playing field among providers of general-purpose AI models where no provider should be able to gain a competitive advantage in the Union market by applying lower copyright standards than those provided in the Union.
The obligation for providers of general purpose AI models to implement a policy to comply with EU copyright law is now also enshrined in Art. 53 draft AI Act:
1. Providers of general purpose AI models shall:
(a) […]
(b) […]
(c) put in place a policy to comply with Union copyright law in particular to identify and respect, including through state of the art technologies, the reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790;
(d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office;
These obligations apply to general purpose AI models if these are placed on the market in the EU. They also apply to general purpose AI models that are released under a free and open source license (see Recital 104 of the AI Act).
What does it mean for businesses?
While the underlying purpose of the newly added provisions might seem clear, many aspects regarding their scope and application remain open to discussion. In any case the copyright related elements of the AI Act could have significant implications on the design and compliance processes for general purpose AI models placed on the EU market.
A few key-take aways:
- Global application of EU copyright? Since the explicit goal of the new obligations is to ensure a “level playing field” (see Recital 106) Art. 53 of the AI Act seem to prescribe that EU copyright law must be adhered to globally if and as far a general purpose AI model is put on the market within the EU. This could mean, in particular, that an “opt-out” from text and data mining would have to be respected globally (unless the general purpose AI model is not available in the EU), regardless of where the training takes place. As a result, the providers of these models would have to consider these requirements both in their design and their lifetime compliance processes – which will be particularly challenging for already released models despite the additional 24 months grace period granted in such cases (see below). Rightsholders – on the other hand – should be aware about the possibility of an “opt-out” and consider corresponding declarations, if applicable.
- Timeline for compliance? As further set out here, we expect that the AI Act enters into force by the end of Q2 or in Q3/2024. Generally, the AI Act shall apply from two years from the date of entry into force, i.e. we expect that the AI Act applies by the end of Q2 or in Q3 2026. However, the obligations for providers of general-purpose AI models – and thus the copyright related obligations – should already apply from 12 months from the date of entry into force of the Act. Thus, compliance with the copyright related obligations will probably be required by the end of Q2 or in Q3 2025 at the latest.
Providers of general purpose AI models that have been placed on the market before 12 months from the date of entry into force of the AI Act, have more time to ensure compliance, namely 36 months from the date of entry into force of the AI Act, i.e. compliance for such models will probably be required by the end of Q2 or in Q3 2027 at the latest. Thus, in light of the limited time available to ensure compliance, providers are well-advised to timely start implementing the relevant processes.
- Sanctions for violations? It is not entirely clear whether non-compliance with the obligation to create a policy to comply with EU copyright law and to document the training data under the AI Act are “only” a violation of the AI Act or also an infringement of copyright law, separately sanctionable under the relevant laws. The latter would mean that the AI Act effectively overrides (or at least cuts through) the established conflict of laws rules in the copyright space - the lex loci protectionis (enshrined in the EU primarily in the Rome II Regulation) insofar as rightsholders could claim infringement under EU copyright laws (and not only a violation of the AI Act) also in cases where the relevant acts (i.e. scraping of training data) have occurred outside of the EU. Such a broad interpretation would have implications for international copyright law going far beyond the AI Act.
Non-compliance with these obligations would, however, in any event constitute a violation of the AI Act itself: Generally, the AI Act provides for significant fines for certain infringements. Interestingly, Art. 53 AI Act is not directly referenced in the respective provisions of the AI Act, which stipulate very high administrative fines (up to 7% of the total worldwide annual turnover). However, Art. 99 (1) of the AI Act requires the Member States to lay down not only penalties but also other enforcement measures, which may also include warnings and non-monetary measures, applicable to infringements of this Regulation by operators. It remains to be seen how the Member States consider this, when they implement the respective national rules on penalties and other enforcement measures.
In addition, non-compliance with the new copyright-centred transparency and compliance provisions might give rise to risks of unfair competition claims by competitors.
- Application to all copyrighted material? The legislative materials of the AI Act do not distinguish between different categories of rights. While it would be conceivable to assume that the obligation to “comply with Union copyright law” would only apply to copyright protected material from the EU the text of the AI Act does not draw such distinction. This may mean that all rightsholders can benefit from the possibility to “opt-out” from the use of their material for the purposes of text and data mining if and to the extent the relevant general purpose AI model is placed on the market in the EU. This would further burden the providers of general purpose models to cater in their policies also for “non-EU opt-outs” when training their model.