A new way for researchers to deposit files in HAL, the French Open Archive
- 26 September 2023 |
- 16:00 |
- Session 3 |
- Sala Nouvel - Reina Sofia Museum
HAL (https://hal.science/) is the French national open repository for publications. Part of the Ministry’s research infrastructure roadmap and of the National Plan for Open Science, HAL is the multidisciplinary open archive chosen by the whole French scientific and university community for the dissemination of knowledge. HAL hosts several types of documents, such as articles, preprints, communications, thesis and so on… It enables the diffusion of more than 1 million scientific documents, and has more than 80.000 active users. HAL provides a network of more than 130 institutional portals for universities and research institutions.
The first (2018) and the second (2021-2024) French plans for Open Science highlighted the need for HAL to simplify the deposit process for researchers, even when they publish in other open access platforms across the world, by developing an integrated self-archiving service, with automated targeted harvest of publications. Thanks to the funding of the French national research agency (HALiance - ANR 21-ESRE-0047), CCSD (https://www.ccsd.cnrs.fr/) will propose a new way of feeding HAL, by developing an integrated self-archiving service, with automated harvesting of publications, published on other open access platforms or repositories. This new service must simplify the process of deposit by the researchers and increase the volume of documents in open access in HAL.
The implementation of this new service for researchers has been realized in partnership with INIST (https://www.inist.fr), another structure of the CNRS, and involves several steps :
Identification of the French scientific output. We have worked on the identification of the missing French scientific production in the HAL archive. This step was carried out using the OpenAlex tool (https://openalex.org/).
Automatic download of publication metadata using the APIs of the different data sources
Deduplication and creation of unified records of publications based on business rules.
Metadata enrichment : From these data, a series of treatments was carried out in order to enrich the metadata. Automatic mechanisms made it possible to classify the publications in scientific themes, to enrich the metadata with information of affiliations by using the French laboratory registry RNSR (National Directory of the structures of research)
Alignment with Unpaywall. The use of Unpaywall service (https://unpaywall.org/) made it possible to define if it was possible to propose to the researcher to automatically recover the fulltext to transfer it in HAL.
Transfer to HAL: Once this database of publications was enriched, mechanisms for automatic suggestions to researchers were set up in order to offer them the possibility of automatically submitting the fulltext in HAL. The SWORD protocol (interoperability standard) was used in order to transfer the files into HAL.
This new service is being tested by a panel of researchers and will be open before the end of 2023.
The presentation will showcase the work that has been done, highlight how we have addressed some technical issues in order to build the new service, the steps we still have to take and the problems we are facing.