Matilda: a bibliographic/metric tool for open science
The Open Access movement has long insisted on the availability and reusability of academic texts as a goal to achieve knowledge dissemination, without putting specific attention to the question of metadata. The fact that no reference was made to metadata in the main OA declarations (Budapest, Berlin, Bethesda) has led to a paradoxical situation. The more publication as a process became accessible and reusable, the more its content was searched and found through privately-owned and often costly bibliographic/metric tools by research communities.
In recent years, the I40C coalition and the I4OA coalition have advocated for the opening of metadata and databases such as OpenAlex and OpenCitations have enabled the sharing of metadata. However, these achievements do not currently meet the needs of researchers for two reasons. On the one hand, they are databases to be used via APIs or dumps and not tools that can be easily appropriated by untrained users. On the other hand, their temporality does not enable services that can be used by those who wish to follow the evolution of the literature on a day-to-day basis.
Faced with this lack, Matilda is based on open source software and multiple open data sources, Matilda aims at constituting an open science infrastructure for all research domains which don’t currently have well-designed community-based search & alert services.
It currently provides at least four services: an easy-to-use multi-criteria search, citation tracking on texts, authors, multi-criteria, the creation of associated alerts via RSS feeds and of course DOI, HTML and PDF links to enable the reading of texts of interest to researchers, outside the platform. Matilda is currently based on four documentary sources (ArXiv, Crossref, PubMedCenral, RePEc) which continuously feed deduplication/aggregation operations, then metadata enrichment through ORCID and the production of reference links, notably through GROBID. As of September 1st, 2023, it displays 112 million “works” (the aggregation of "identical" documents), 200 million documents and more than 9.2 million authors, with a mean daily update of around 200,000 documents.
The current version only relies on metadata, including abstract, reconstructed references and authors identification. The ongoing development will enable full-text search by the end of 2023, making Matilda a real alternative to Google Scholar and commercial databases, especially for citation-tracking services. It will be fully available to researchers from autumn on, for feedback and to better understand how researchers use these search tools, as there is almost no literature on these uses.