Exploring trends and impact of scientific publications based on open access journals: an application in the archaeological research domain

26 September 2023 |
16:00 |
Session 1 |
Main Auditorium

Traditionally, analyses of scientific publications have followed a top-down approach, where existing taxonomies, derived by journal and bibliometric classifications, are imposed onto one’s data. This approach has limitations when it comes to capturing interdisciplinary and emerging fields of study. We believe that, by leveraging on new AI techniques, we can get insights on impacts and trends in a much richer and semantically-driven way. Another limitation of traditional approaches is their reliance on close and proprietary data repositories. Nowadays, with the availability of new open databases (OpenAlex, OpenAIRE, etc) and the increasing availability of open access journals, doing an analysis of scientific publications based on open resources has become much more feasible. The aim of this talk is to demonstrate both types of innovation by presenting a particular application in the archaeological domain.

Our use-case is an analysis of articles published in Archaeologia e Calcolatori (A&C), an international open-access journal specialized on computer applications in Archaeology, the repository of which acts as a provider for OpenAIRE and Europeana. The end result is a knowledge map that allows one to access scientific contents in a nuanced and meaningful way as well as to understand the impact and the specialisation of its publications with respect to other similar journals.

We use two main data sources for the last 10 years: the A&C publications and the proceedings of the conferences from Computer Applications and Quantitative Methods in Archaeology, a comparatively similar collection. This second set serves as a benchmark with which to compare A&C.

When it comes to high-level categorization, we performed both top-down and bottom-up techniques. First, we trained a multi-label classifier based on the Association for Computing Machinery taxonomy. Next, we performed topic modelling on our dataset, by clustering together similar titles and abstracts. In both cases, we used the pre-trained model Specter2 [1], a language model specialized on scientific literature, in order to vectorize the texts.

Both of these strategies offer a coarse-grained look at the data, but often one needs to access a higher level of granularity to extract useful insights. To this end, automatically identifying important entities (eg. places, artefacts) in a text can prove useful. In our study, we used a pre-trained named entity recognition model for archaeologically-relevant entities. Finally, when it comes to computer applications in archaeology, an important type of entities are technologies (eg. LiDAR sensor, virtual reality). To match publications with technologies, we queried Wikidata to obtain a comprehensive list of technologies and then applied fuzzy matching with our texts, thereby obtaining the desired links.

By the end of the talk, the audience will have seen an end-to-end example of semantically-driven mapping of scientific publications which makes it possible to navigate content in a much richer way than normally available. With this, we hope to demonstrate the feasibility of using open data to perform science mapping and to showcase tools and strategies that, being domain-agnostic, can be applied beyond any particular field.

Presenter

Berta Grimau

Philosopher by education (University of Barcelona), I specialised early on in formal logic and semantics, topics to which I devoted my master’s (Ludwig-Maximilians Universität München) and my PhD thesis (University of Glasgow). After my PhD, I was a postdoc for the Institute of Information Theory and Automation of the Czech Academy of Sciences. Currently, I’m working as a data scientist for SIRIS Academic, a higher education & research consulting company based in Barcelona.

Nicolau Duran-Silva

Computer scientist, specialised in language technologies. Currently, working as researcher and data scientist at SIRIS Academic. Coordinator of the SIRIS Lab, R&D division of SIRIS Academic. Currently doing an Industrial PhD at the University Pompeu Fabra in Barcelona with the company SIRIS Academic, about automatic information extraction from multi-source documents and textual simplification of heterogenous technical documents.

Paola Moscati

Archaeologist, research director and since 2023 senior associate of the CNR Institute of Heritage Science (ISPC), she has oriented her research towards topographical studies and the application of computer methods in archaeology. Head of the Archeology and Information Society research line, she directed the international open access journal “Archeologia e Calcolatori” from 1990 to 2022, of which she now coordinates the Scientific Committee. She is responsible for the project The Virtual Museum of Archaeological Computing, created in collaboration with the Accademia Nazionale dei Lincei, and for the Operational Unit of the Rome branch of the ISPC in the PNRR H2IOSC (Humanities and Cultural Heritage Italian Open Science Cloud) project.

Alessandra Caravale

Archaeologist, researcher of the Institute of Heritage Science of Italian CNR. Her research line “Archaeological computing and e-publishing” is directed towards archaeological computing in its historical evolution and in the current panorama, with particular regard to the automated cataloguing of archaeological heritage, databases, digital resources for archaeology and open access publishing. Editor in chief of the international journal “Archeologia e Calcolatori”.

Bernardo Rondelli

With a PhD in Archaeology (University of Bologna), he coordinated international projects in the field of archaeology and heritage sciences for over 10 years, first at the Bicocca University in Milan, then at the University of Barcelona and finally at the Institute of Humanities of the Spanish National Research Council (CSIC) in Barcelona. In 2010, he founded SIRIS Academic SL, a European consultancy firm specialising in higher education, research and innovation policies. Since 2010, he has coordinated numerous projects in more than 10 countries, working with universities and research institutes, governments, agencies and philanthropic organisations. His work focuses on supporting complex decision-making processes by promoting evidence-based approaches. Since March 2023, he is President of the SIRIS Foundation, a private not-for-profit foundation promoting open science and open government, to contribute to the dissemination of knowledge in a broad, inclusive and cross-cutting way and supporting the use of scientific evidence for decision-making and public investment.