Skip to main content

Lightning talk

Training AI systems for research and non-research purposes: surveying the stock images and scientific publishing sectors

  • 26 September 2023 |
  • 16:00 |
  • Session 1 |
  • Main Auditorium

Today, the trending news on AI is mostly related to generative systems like those owned by OpenAI, as is the case of ChatGPT and, more recently, Stability AI. Much has already been said about these systems: their accomplishments in field-specific exams and the potential impacts on education, the fact that can generate outputs that, prima facie, could be considered protectable subject matter if they were created by humans, and all the legal and philosophical debates regarding their authorship and ownership. More recently, the hype created by AI-generated outputs has been giving space to skepticism and concerns about the underlying technology and the potential community effects, as can be seen from the recent tweet from OpenAI’s CEO on the fact that we should not “be relying on it for anything important right now”, and the recent ban of AI-generated outputs from platforms like GettyImages and other art-communities for legal and social reasons.

Amongst the legal issues surrounding these systems, one that was often overshadowed by the AI-authorship headlines relates to an important step of the training of some AI systems, commonly referred to as Text and Data Mining, and which may involve the use of copyrighted works. Recently, the lawsuits filed by GettyImages shed light on the issue of the lawfulness of the use of third-party images to train these generative AI systems. While the scope of what is encompassed under TDM techniques may vary according to the legal definition, part of the literature claims that mining copyrighted works for purposes of training AI systems does not involve expressive uses, and therefore should not raise any copyright issues. On the other hand, some steps in the process may involve temporary and permanent copies, which have been leading countries to change their laws in order to provide further legal certainty for those interested in carrying out TMD-related research and businesses.

In a disharmonized international legal framework related to TDM with different degrees of restrictions (on uses, users, purposes and kinds of works), there are also norms in force, like art 4 of the European Directive 2019/790, that exclude from the scope of the exception uses reserved by owners, as it can be the case of companies, like GettyImages and academic publishers, that made the licensing of their database for TDM purposes part of their businesses. But how about if TDM is carried out for research purposes? Are these licenses enforceable? Can a copyright exception be overridable by contract?

Building upon the existing literature on research exceptions in comparative copyright, this lightning talk comments upon a work in progress co-authored by Thomas Margoni, Sean Flynn and Luca Schirru that aims to analyze the Terms of Use of major players in (i) stock images licensing and (ii) academic publishing and to compare how restrictive and legally enforceable are their licenses when it comes to text and data mining for research and non-research purposes.


Luca Schirru

Luca Schirru is a post doctoral researcher at the Centre for IT & IP Law (CiTiP – KU Leuven) and executive director of the Brazilian Copyright  Institute. Luca is a member of the Global Expert Network on Copyright User Rights, and was awarded the.Arcadia Fellowship in International Copyright (21-22). Guest professor at Federal University of Rio de Janeiro (Graduate Program on Public Policy, Development and Strategies (PPED/IE)) and teaches at the Specialization Program on Intellectual Property Law of the Pontifical Catholic University of Rio de Janeiro.