This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
| 6 minute read

AI Training & Copyright Part 2: Text-and-data mining under court scrutiny

Recent caselaw by the Higher Regional Court of Hamburg 

(“Kneschke vs. LAION e.V.”)

The discussion of the high-stakes question whether – and to what extent – it is admissible to use copyright protected material for the purposes of AI training is increasingly under court scrutiny, including in Germany. 

A recent first-instance decision by the Regional Court of Munich in Gema vs. Open AI has attracted widespread media attention as well as criticism for using a short-cut to establish OpenAI’s liability (see our corresponding blog here).

This part of our blog series focuses on a second-instance decision by the Higher Regional Court of Hamburg (HRC Hamburg): The court found that the use of a photography for the creation of an AI training dataset by non-profit organization LAION was justified based on the text and data mining (TDM) exception – in both its variants for scientific and general uses – and that the photographer’s opt-out lacked the necessary machine readability.

While the HRC’s decision is not final because a further appeal to the German Federal Court of Justice (BGH) was admitted, it provides valuable insights regarding the application of the TDM exception. These are relevant for AI developers throughout the EU. Based on the upcoming compliance obligations in the AI Act, they will also become relevant globally (see our blog here).

Background: Kneschke vs. LAION

The case decided by the HRC Hamburg was brought by photographer Robert Kneschke against non-profit organization LAION e.V.

In 2021, LAION downloaded one of plaintiff’s photographs from an agency website and analysed it by comparing the image to its description (i.e. checking whether the description and that image “matched”). LAION then included the metadata in a publicly available, free dataset. This dataset can be used for AI training and contains links to almost six billion images which are available on the internet as well as corresponding descriptions. The agency website contained a “RESTRICTIONS” section, prohibiting the use of automated programs to download content (but did not address TDM explicitly). The photographer claimed the reproduction during the analysis infringed his copyright and that no exception was applicable for the use. LAION argued that downloading the image was admissible for at least three separate reasons: as temporary act of reproduction, as TDM or because of the photographer’s (implied) consent.

Both the Regional Court of Hamburg (file number 310 O 227/23) and the HRC Hamburg (file number 5 U 104/24) dismissed Kneschke’s claims since LAION’s use constituted admissible TDM.

In 2024, the Regional Court of Hamburg ruled that although LAION has reproduced the image in question, this use was justified by the TDM exception for the purposes of scientific research by research organisations (Sec. 60d German Copyright Act, Art. 3 Digital Single Market Directive (DSM-Directive)). 

In December 2025, the HRC Hamburg confirmed this ruling. Unlike the Regional court (which had left this question open) the HRC Hamburg explicitly confirmed that the reproduction was admissible based on both the exception for scientific (Art. 3 DSM-Directive, Sec. 60d German Copyright Act) and general (Art. 4 DSM-Directive, Sec. 44b German Copyright Act) TDM. The download of the image for the creation of an AI training dataset constituted a copyright-relevant act but was justified under both TDM exceptions. Furthermore, the rights reservations declared by the images agency were irrelevant because the declaration was not machine-readable. 

The following observations from the judgment are noteworthy:

Reproduction for AI training purposes is TDM

The HRC confirms that downloading the photograph constitutes a copyright-relevant use (reproduction), but that his reproduction was admissible based on the TDM exception: TDM means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations (Art. 2 Nr. 2 DSM-Directive). The Court argues, determining whether an image and its description match, constitutes a correlation. And that, in any case, the connection between image and text represents (other) information. 

The confirmation by the HRC in favour of TDM for AI training is hardly surprising. Despite some individual voices in literature challenging that notion, the applicability of the TDM exception in the context of AI training is a common assumption and was further confirmed by the EU legislator through an express reference in the EU AI Act. 

Creation of an AI training dataset for the purposes of “scientific research” 

The HRC confirms the Regional Court’s assessment, that LAION’s use of the photograph was permissible based on the TDM exception for scientific purposes because the creation of an AI training dataset – in itself – constitutes scientific research. The court held that the creation of a dataset is a methodical, verifiable procedure aimed at gaining knowledge, which can be classified as applied research. In an important additional note, the HRC found that the subsequent use of the dataset for AI training purpose to be scientific research as well.

LAION is also qualified as an “other institution conducting scientific research": It does not pursue commercial purposes and there was no evidence that a private company exercised decisive influence or received privileged access to the dataset results. 

Requirements for valid TDM opt-outs

The HRC’s analysis could have ended there because the TDM exception for scientific purposes does not foresee a rightsholder opt-out. However, the court added a second layer of reasoning apparently in order to make the judgment less vulnerable on appeal: It confirmed that the rightsholder’s opt out was invalid. This opened the door for the application of the general TDM exception: If the use of works had been expressly reserved by the rightsholder in an appropriate manner, the general TDM exception would not apply (Sec. 44b(3) German Copyright Act, Art. 4(3) DSM-Directive). For content made publicly available online, the reservation must be machine-readable. 

The HRC discusses “who” can declare an opt-out and “how” a declaration must be made in order to be “machine-readable”:

  • “Who”: Holders of simple usage rights (such as the images agency in this case) can issue valid “opt-out” declarations if this is done with the author’s consent. While this reduces the foreseeability (the author’s consent cannot be derived from the opt-out declaration) it ensures the effectiveness of declarations throughout the value and license chain.
  • How”: A rights reservation must not explicitly mention “TDM” as long as TDM is covered based on an interpretation of the declaration. Since the possibility to use copyright protected material for AI training is becoming more commonly known, it is conceivable that courts will apply a stricter standard in the future. 

    The hotly contested question as to what is “machine-readable” is not decisively answered for future cases: The HRC found that “machine-readable” is a technology-neutral term and that a declaration is machine-readable if it can be both perceived and interpreted by a machine. The court assumed that the plaintiff did not demonstrate that opt-out declarations in natural language were machine-readable in that sense at the time of LAION’s download (back in 2021). It remains to be seen how the understanding of “machine-readability” will evolve over time.

Outlook

The HRC’s decision is an important piece in the evolving mosaic of caselaw on AI training and copyright:

  • The court’s assumption that AI training is – in principle – covered by the TDM exception is shared broadly by courts and legal scholars.
  • The clarification that research institutions can benefit from TDM even if the results of their research can be used for commercial purposes by third parties is equally important and provides such institutions with (albeit: limited) legal certainty.
  • The court’s assumption that “machine-readable” requires a declaration to be both perceivable and interpretable by a machine (and its subsequent assumption that opt-out declarations in natural language were not machine-readable  in 2021) provides some guidance for future cases. This reasoning suggests that even if a machine could technically ‘perceive’ natural language, its ambiguity or lack of clarity could lead to it failing the ‘interpretation’ requirement. Rightsholders who want to “opt-out” should do so expressly and in a truly machine-readable format (such as the robots.txt protocol).

None of these findings is set in stone: The HRC has allowed a second appeal, and further courts throughout the EU will continue to weigh-in. The first TDM-related referral is already pending before the CJEU (Like Company vs. Google).

Rightsholders, AI developers (and users) are well-advised to closely monitor the evolving regulatory and litigation landscape. We will continue to monitor the discussion in the EU and beyond and will provide regular updates.

Tags

ai, eu dsm directive, intellectual property