The goal of the this working group (WG) is to define a Research Publishing framework to simplify the adoption of that practice, by enabling the services of research infrastructures to seamlessly integrate repository deposition workflows in the context of the EOSC.
Such an interoperability framework will consist in API and formats that will allow for the implementation of end-to-end research workflows with an on-demand (semi) automatic step of publishing that ensures the FAIRness of research outputs obtained thanks to the RI. After the first six months of activity, the WG plans to open the protocol specification for public consultation and suggest its inclusion in the EOSC Interoperability Framework.
The WG will contribute to the areas of EOSC Exchange for the definition of the EOSC Interoperability Framework on data publishing and open data, User Experience -Resource Sharing and Discovery, and User Experience - Resource Composability. The activities align with the mission of EOSC Future and with the foreseen collaboration activities with INFRAEOSC-07 projects and Science Projects.
For more information please refer to the WG charter.
Planned Activities
M1 - November 2021
# | Activities | Output | Due Month (indicative) |
---|---|---|---|
1 | Landscape study on (semi-)automatic publishing workflows/integration between RI and repository services | Report (Input for activities 2 and 3) | M3 |
2 | Identification of real-case scenarios and integration patterns among the services involved in the WG | Report (Input for activity 4) | M3 |
3 | Analysis of existing repository deposition frameworks from the functional and non- functional perspectives (identification of common patterns, common problems, etc) | Report (Input for activity 4) | M5 |
4 | Protocol recommendation v1.0 open for consultation and refinement | Recommendation | M6 |
How to join?
Send an email to alessia.bardi[at]isti.cnr.it with subject "EOSC FUTURE WG subscription" to be added to the mailing list of the group and be included in the activities.
WG Summary
The WG has organised 7 online meetings since November 2021, involving members of INFRAEOSC projects, thematic and horizontal research infrastructures.
The WG studied the status quo of (semi-)automatic publishing workflows by reviewing existing experiences, at any maturity level, about the design and implementation of (semi-)automatic deposition steps of research assets produced in a research infrastructure to a repository service. The study revealed 17 implemented use cases (some prototypal, some beta, some operational in production environments) and 4 generic scenarios that could benefit from the existence of an interoperability framework for research product publishing. The existing protocols and frameworks for deposition were scanned and a survey was conducted among the members to reach an agreement for proposal to the EOSC IF.
Two options were identified as relevant and are proposed for inclusion into the EOSC IF: SWORD protocol v3 and a combination of COAR Notify and Signposting.
In addition, SignPosting (in particular the part of “Publication Boundaries”) is also proposed to be included to support a 5th scenario (proposed by the group working on the EOSC cross-infrastructures use cases).
The recommendation for the EOSC Interoperability Framework is currently under preparation and will be open to suggestions, comments and feedback from EOSC partners and research communities about the proposed frameworks and their known limitations (e.g. complexity of SWORD v3, low adoption, partial implementation).
Proposed guidelines (as of 28 June 2023 under review):
- Bardi, Alessia, et al. (2023). EOSC-IF / Interoperability Guideline: Research Product Deposition (1.0). Zenodo. https://doi.org/10.5281/zenodo.8091897
- Bardi, Alessia, et al. (2023). EOSC IF Interoperability Guideline: Access to content via PID (1.0). Zenodo. https://doi.org/10.5281/zenodo.8091103
Participants
Chairs:
- Alessia Bardi, CNR, OpenAIRE Nexus WP7 leader
- Jose Benito Gonzalez Lopez, CERN, Zenodo.org technical manager
- Paolo Manghi, OpenAIRE, OpenAIRE Nexus coordinator, EOSC Future TCB member
Members
- Chris Ariyo, EUDAT B2SHARE service owner
- Andreas Czerniak, Bielefeld University Library/OpenAIRE-Nexus
- Paul Gondim van Dongen, SURF
- Georgios Kakaletris, NEANIAS, Project Technical Manager
- Raul Palma, PSNC, RELIANCE coordinator
- Silvio Peroni, University of Bologna, Director of OpenCitations
- Hans van Piggelen, SURF
- Mark van de Sanden, Technical Coordinator EUDAT, EOSC Future TCB member
- Diego Scardaci, EGI, Technical Solution Team Lead, EOSC Future TCB member
- Jochen Schirrwagen, project coordinator at Bielefeld University Library, OpenAIRE-Nexus
- Debora Testi, CINECA, DICE project coordinator
- Raphaël Tournoy, CNRS, Episciences Project Manager
- Irena Vipavc, Social Science Data Archive, University of Ljubljana
- Deborah Grbac, Library of Università Cattolica del Sacro Cuore di Milano
- Carl-Fredrik Enell, EISCAT Scientific Association
- Guido Aben, CS3MESH4EOSC
- Ivan Heibi, OpenCitations
- Jorik van Kemenade, SURF, Manager of the SURF data repository
Meetings
Agenda and notes are available in the rolling minutes document at https://docs.google.com/document/d/16_aAcuBjc_fO0sm50UjS2J0UOpMPrfsUXsYnIan6aMk/edit#
Meeting #7 12 October 2022
Meeting #6 19 September 2022 14-15 CET
Meeting #5 8 June 2022 10-11 CET
Meeting #4 16 February 2022 11-12 CET
- Link to join: https://meet.google.com/cxi-kmbn-vbs
Meeting #3 25 January 2022 15:00-16:00 CET
- Link to join: https://global.gotomeeting.com/join/139603853
Meeting #2 17 December 2021 10:00 CET
- Link to join: https://global.gotomeeting.com/join/411692069
Kick Off Meeting 22 November 2021 - 15:00 CET
- Link to join: https://global.gotomeeting.com/join/692084805
- Notes and presentations: https://docs.google.com/document/d/16_aAcuBjc_fO0sm50UjS2J0UOpMPrfsUXsYnIan6aMk/edit#
-------------
EOSC Future WP3 - T3.3
WG Documented Outputs Template
WP3 Task 3.3 Information
Link to Public Wiki webpage (this page)
Working Group Information
Chair:
Alessia Bardi, CNR, OpenAIRE Nexus WP7 leader
Jose Benito Gonzalez Lopez, CERN, Zenodo.org technical manager
Paolo Manghi, OpenAIRE, OpenAIRE Nexus coordinator, EOSC Future TCB member
Members:
Chris Ariyo, EUDAT B2SHARE service owner
Andreas Czerniak, Bielefeld University Library/OpenAIRE-Nexus
Paul Gondim van Dongen, SURF
Georgios Kakaletris, NEANIAS, Project Technical Manager
Raul Palma, PSNC, RELIANCE coordinator
Silvio Peroni, University of Bologna, Director of OpenCitations
Hans van Piggelen, SURF
Mark van de Sanden, Technical Coordinator EUDAT, EOSC Future TCB member
Diego Scardaci, EGI, Technical Solution Team Lead, EOSC Future TCB member
Jochen Schirrwagen, project coordinator at Bielefeld University Library, OpenAIRE-Nexus
Debora Testi, CINECA, DICE project coordinator
Raphaël Tournoy, CNRS, Episciences Project Manager
Irena Vipavc, Social Science Data Archive, University of Ljubljana
Deborah Grbac, Library of Università Cattolica del Sacro Cuore di Milano
Carl-Fredrik Enell, EISCAT Scientific Association
Guido Aben, CS3MESH4EOSC
Ivan Heibi, OpenCitations
Jorik van Kemenade, SURF, Manager of the SURF data repository
Start Date: November 2021
Finish Date: March 2023
Working Group Links
Link to Public Wiki webpage (this page)
WG Background (Pre-operational phase)
General description of the need Working Group - specifically for EOSC
Open Science calls for researchers to publish as soon as possible any type of research product in such a way their research activity can be transparently assessed, reviewed, reproduced, and rewarded in all its aspects.
However, the publishing process has become more and more a burden for scientists, who must, most of the time, spend time to publish their articles, data, software, and other products in the many institutional or thematic repositories of reference. Scenarios include first-time publishing of new resource products or double-publishing of research products, to satisfy institutional mandates and community practices. Such tedious work is often incomplete, with some products ending up unpublished and others showing incomplete or imprecise metadata.
For the EOSC to act as enabler for Open Science practices, its Interoperability Framework should guide services of research infrastructures and clusters of the EOSC on how to implement (semi-)automated workflows for the deposition and consumption of research products. As a consequence, the EOSC will:
- Unburden the work of scientists, which can focus on their experiments and skip the hurdle of the manual publishing process or download of existing data to re-use;
- Ensure publishing of science in a structured, complete, FAIR, reproducible, and community-driven manner;
- Ensuring high-quality metadata and fully-fledged monitoring/accounting of science, by systematically interlinking research products between them and with the services they are related with;
- Increasing repository visibility and adoption.
Problems the Working Group was to address
Some communities investigated and realized the integration of their research performing services, from research infrastructures and clusters, with repositories for research product deposition. The integration ensures that outcomes of such services are deposited automatically, prior authorization of the users, into a given repository, giving life to an end-to-end scientific workflow, from experimentation to publishing.
The limit of existing approaches is to be bound to a specific repository API and format; introducing multiple repositories as potential targets of deposition for the service, multiplies the problem, as bilateral interactions with the respective repository API must be established. For example, the Zenodo deposition API and the B2SHARE API are similar but different in many ways; a service willing to automate publishing into either repositories would require implementing and maintaining two different workflows.
Another important aspect of Open Science is the possibility to re-use existing research products (e.g. research data), deposited in repositories and accessible via their persistent identifiers (e.g. handle, doi, ark). However, there is no standard way a service can access the actual content behind persistent identifiers, as these typically resolve to the landing pages of the research products.
The lack of standard for accessing the actual content identified by persistent identifiers makes the automatic consumption of research products hardly implementable and, when possible, limited to the persistent identifiers issued by a specific repository (e.g. the first prototype of the Data Transfer Service integrated in the EOSC EXPLORE portal supports only DOIs from Zenodo).
Solutions the Working Group intended to output/provide recommendations on
The working group intends to recommend the EOSC Interoperability Framework with protocols for research product deposition and content consumption, so as to enable EOSC services to seamlessly integrate with any compliant repositories.
WG Summary of Actions (During the operational phase)
Working Group - operational activity
The WG has organised 7 online meetings since November 2021, involving members of INFRAEOSC projects, thematic and horizontal research infrastructures.
A mailing list has been set up by OpenAIRE for the communication among WG members.
A working document on GoogleDoc was set up by the chairs to collect contributions from the WG members.
Working Group - work undertaken during the operational phase
The WG studied the status quo of (semi-)automatic publishing workflows by reviewing existing experiences about the design and implementation of (semi-)automatic deposition workflows of research assets produced in a research infrastructure to a repository service. The landscape study revealed 15 implementations, which were then classified according to the following axes:
- Maturity level (design, prototype, beta, production)
- Type of submitter/receiver (repository to repository, analysis/research tool to repository, scholarly Service to repository, publishing platform to repository)
- Implemented API/protocol (Zenodo API, B2Share API, SWORD)
The WG went through a brainstorming session for the identification of scenarios, even beyond those implemented by the use cases reported for the landscape study, that would benefit from an interoperability framework for research product deposition. A final set of five scenarios has been defined thanks to that session and the collaboration for the EOSC Future cross-infrastructure use cases on the Data Transfer Service.
The existing protocols and frameworks for deposition were scanned and a survey was conducted among the members to reach an agreement for proposal to the EOSC IF.
Two options were identified as relevant and are proposed for inclusion into the EOSC IF: SWORD protocol v3 and a combination of COAR Notify and Signposting.
The group also agreed to suggest the EOSC guidelines for research product onboarding (i.e. the OpenAIRE guidelines) as metadata exchange format.
In addition, SignPosting (in particular the part of “Publication Boundaries”) is also proposed to be included to support the 5th scenario about the direct access to actual content behind a persistent identifier (proposed by the group working on the EOSC cross-infrastructures use cases).
Working Group - challenges
The boundaries of the topic to be discussed are different based on the point of views and areas of expertise of the participants. Some wanted to focus on protocols for file transfers, some on protocols for metadata exchange, others on domain agnostic/domain-specific metadata formats.
The WG eventually agreed to not recommend domain-specific formats or formats for specific types of products (e.g. Jupyter Notebooks), but to focus on general-purpose and leave community-specific decisions to communities.
Few WG members had direct hands-on experience at implementing the suggested protocols, so the WG needed extended time to involve other colleagues with direct experience and share a more comprehensive understanding of the potentials and limitations of the protocols.
WG Highlighted Outputs/Recommendations (Post-operational phase)
Working Group summary of outputs and recommendations
- Landscape study and scenarios
- The document provides an overview of existing experiences, at any maturity level, about the design and implementation of (semi-)automatic deposition workflows of research assets and describes five generic scenarios that would benefit from an EOSC Interoperability Framework for research product publishing
Link: https://docs.google.com/document/d/1dyfyJLDjyZiHVeRq1aIUMc9p9bw85zJ5P7eVmUW68lU/edit?usp=sharing - Recommended protocols for research product deposition in push mode
- Recommended protocols for research product deposition in pull mode
- COAR Notify + Signposting: the service can inform the repository that something new is available at a given accessible location (with COAR Notify) and the repository can then use the Signposting protocol (implemented by the service) to know where to get the content and metadata for the deposition.
- Recommended protocols for accessing the content behind a PID:
- SignPosting (publication boundary)
- Recommended metadata exchange formats:
- Latest versions of the OpenAIRE guidelines (EOSC guidelines for onboarding of research products)
- Other community specific format may be adopted in addition to them
Additional details and examples available in the Recommendation document.
WG Summary Paragraph + Further work needed
The WG has organised 7 online meetings since November 2021, involving members of INFRAEOSC projects, thematic and horizontal research infrastructures. The WG studied the status quo of (semi-)automatic publishing workflows by reviewing existing experiences, at any maturity level, about the design and implementation of (semi-)automatic deposition steps of research assets produced in a research infrastructure to a repository service. The study revealed 14 implemented use cases (some prototypal, some beta, some operational in production environments) and 4 generic scenarios that could benefit from the existence of an interoperability framework for research product publishing. The existing protocols and frameworks for deposition were scanned and a survey was conducted among the members to reach an agreement for proposal to the EOSC IF.
Two options were identified as relevant and are proposed for inclusion into the EOSC IF: SWORD protocol v3 and a combination of COAR Notify and Signposting. The EOSC guidelines for research product onboarding (i.e. the OpenAIRE guidelines) are suggested as metadata exchange format.
In addition, SignPosting (in particular the part of “Publication Boundaries”) is also proposed to be included to support a 5th scenario about the direct access to actual content behind a persistent identifier (proposed by the group working on the EOSC cross-infrastructures use cases).
The WG is keen to gather suggestions, comments and feedback from EOSC partners and research communities about the proposed frameworks and their known limitations (e.g. complexity of SWORD v3, low adoption, partial implementation).
Further work is needed from the implementation point of view.
OpenAIRE is including the implementation of the framework in its roadmap to improve the metadata integration with Zenodo and other repositories.
HAL and Episciences implemented the COAR Notify and Signposting protocols in production, while other use cases, including the one between HAL and Peer Comunity In are planned to be implemented by Summer 2023.
Episciences and Software Heritage are working on the implementation of COAR Notify in the context of the FAIRCORE4EOSC EC project (first version by Dec 2023).