Open Science Infrastructure


Open Science Infrastructure is information infrastructure that supports the open sharing of scientific productions such as publications, datasets, metadata or code. In November 2021 the Unesco recommendation on Open Science describes it as "shared research infrastructures that are needed to support open science and serve the needs of different communities".
Open science infrastructures are a form of scientific infrastructure that support the production of open knowledge. Beyond the management of common resources, they are frequently structured as community-led initiatives with a set collective norms and governance regulations, which makes them also a form of knowledge commons. The definition of open science infrastructures usually exclude privately owned scientific infrastructures run by leading commercial publishers. Conversely it may include actors not always characterized as scientific infrastructures that play a critical role in the ecosystem of open science, such as publishing platforms in open access.
Computing infrastructures and online services have played a key role in the production and diffusion of scientific knowledge since the 1960s. While these early scientific infrastructure were initially envisioned as community initiatives, they could not be openly used due to the lack of interconnectivity and the cost of network connection. The creation of the World Wide Web made it possible to share data and publications on a large scale. The sustainability of online research projects and services became a critical policy issue and entailed the development of major infrastructure in the 2000s.
The concept of open science infrastructure emerged after 2015 following a scientific policy debate over the expansion of commercial and privately owned infrastructures in numerous research activities and the publication of the Principles for Open Scholarly Infrastructures. Since the 2010s, large ecosystems of interconnected scientific infrastructures have emerged in Europe, South and North America through the development of new open science project and the conversion of legacy infrastructures to open science principles.

Definitions and terminology

Open science infrastructure is a form of knowledge infrastructure that makes it possible to create, publish and maintain open scientific outputs such as publication, data or software.
A Unesco recommendation about open science approved in November 2021 defines open science infrastructures as "shared research infrastructures that are needed to support open science and serve the needs of different communities". A SPARC report on European open science infrastructure includes the following activities within the range of open science infrastructures: "We define Open Access & Open Science Infrastructure as sets of services, protocols, standards and software contributing to the research lifecycle – from collaboration and experimentation through data collection and storage, data organization, data analysis and computation, authorship, submission, review and annotation, copyediting, publishing, archiving, citation, discovery and more".

Infrastructure

The use of the term "infrastructure" is an explicit reference to the physical infrastructures and networks such as power grids, road networks or telecommunications that made it possible to run complex economic and social system after the industrial revolution: "The term infrastructure has been used since the 1920s to refer collectively to the roads, power grids, telephone systems, bridges, rail lines, and similar public works that are required for an industrial economy to function If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy". The concept of infrastructure was notably extended in 1996 to forms of computer-mediated knowledge production by Susan Leigh Star and Karen Ruhleder, through an empirical observation of an early form of open science infrastructure, the Worm Community System. This definition has remained influential through the next two decades in science and technology studies and has affected the policy debate over the building of scientific infrastructure since the early 2000s
Open science infrastructure have specific properties that contrast them with other forms of open science projects or initiatives:
  • Open science infrastructures are not simply a technical product but embed a set of tools, institutions and social norms. Consequently, infrastructures are not always visible as they can be largely hidden under the routine of normal activities The resilience and tacitness of the infrastructures makes it especially difficult to identify the real contributions and "labour cost" of open science work, as it remains "invisible in the university system". This make it also difficult to allocate funding effectively as critical infrastructure may remain undetected by funding bodies.
  • Open science infrastructures are durable and resilient. They are expected to run on a long-term basis and multiple research programs relies on. To some extent, infrastructure are successful when they are forgotten and become an integral part of routine research activities: "Infrastructure at its best is invisible. We tend to only notice it when it fails."
  • Open science infrastructures can be shared and used by different actors and communities. It must be sufficiently consistent to remain coordinated and yet it have to welcome a diverse array of local uses: "an infrastructure occurs when the tension between local and global is resolved". Predefined agreement on the scope and the governance of the infrastructure within all stakeholders is a critical step.

    Openness and the commons

Open science infrastructures are open, which differentiate them with other scientific and knowledge infrastructure and, more specifically, with subscription-based commercial infrastructures. Openness is both a core value and a directing principle that affect the aims, the governance and the management of the infrastructure. Open science infrastructure face similar issues met by other open institutions such as open data repositories or large scale collaborative project such as Wikipedia: "When we study contemporary knowledge infrastructures we find values of openness often embedded there, but translating the values of openness into the design of infrastructures and the practices of infrastructuring is a complex and contingent process".
The conceptual definition of open science infrastructures has been largely influenced by the analysis of Elinor Ostrom on the commons and more specifically on the knowledge commons. In accordance with Ostrom, Cameron Neylon understates that open infrastructures are not only characterized by the management of a pool of common resources but also by the elaboration of common governance and norms. The economic theory of the commons make it possible to expand beyond the scope of limited scope of scholar associations toward large scale community-led initiatives: "Ostrom's work provides a template to make the transition from a local club to a community-wide infrastructure." Open science infrastructure tend to favor a non-for profit, publicly funded model with strong involvement from scientific communities, which disassociate them from privately owned closed infrastructures: "open infrastructures are often scholar-led and run by non-profit organisations, making them mission-driven instead of profit-driven." This status aims to ensure the autonomy of the infrastructure and prevent their incorporation into commercial infrastructure. It has wide range implications on the way the organization is managed: "the differences between commercial services and non-profit services permeated almost every aspect of their responses to their environment".
Open science infrastructures are not only a more specific subset of scientific infrastructures and cyberinfrastructures but may also include actors that would not fall into this definition. "Open access publication platforms" such as Scielo, OpenEdition or the Open Library of Humanities are considered an integral part of open science infrastructures in the UNESCO definition and in several literature review and policy reports, whereas they were usually considered as a separate entities in the policy debate on cyberinfrastructure and e-infrastructures. In the 2010 report of the European Commission on e-infrastructure, scientific publishing platforms are "not e-Infrastructures but closely related to it".
Open science infrastructures may also incorporate additional values and ethical principles. Samuel Moore has theorized a form of care-full scholarly commons that does not exist yet but would incorporate latent forms of open science infrastructure and communities: "In addition to sharing resources with other projects, commoning also requires commoners to adopt an outwardly-focused, generous attitude to other commons projects, redirecting their labour away from proprietary." In 2018, Okune et al. introduced a similar concept of "inclusive knowledge infrastructures" that "deliberately allow for multiple forms of participation amongst a diverse set of actors and seek to redress power relations within a given context."

Principles for open science infrastructures

In 2015 Principles for Open Scholarly Infrastructure have laid out an influential prescriptive definition of open science infrastructures. Subsequent definitions and terminologies of open science infrastructures have been largely elaborated on this basis. The text has also influenced the definition of open science infrastructure retained by the UNESCO in November 2021.
The Principles attempt to hybridize the framework of infrastructure studies with the analysis of the commons initiated by Elinor Ostrom. The principles develop a series of recommendations in three critical areas to the success of open infrastructures:
  • Governance: the governance of the infrastructure should be open and accountable to the scientific communities it aims to serve. Specific measures should ensure that the management of the organization is transparent and diverse.
  • Sutainability: the core activities of organization should be covered by recurring funds. Short-term subventions should be limited to short-term projects. While the organization could charge for services, it should not extend to the data that should remain "a community property".
  • Insurance: the technical infrastructure and the output of the organization are open. This ensure that the infrastructure can be recreated if necessary.
The text ends by mentioning several potential consequences of the principles. The authors advocate for a responsible centralization, that embodies a different than the large web commercial platforms like Google and Facebook while still maintaining the important benefit of centralized infrastructures: "we will be able to build accountable and trusted organisations that manage this centralization responsibly". Existing examples of large open infrastructure include ORCID, the Wikimedia Foundation or CERN.
A more critical reception has focused on the underlying political philosophy of the Principles. While the scientific community is a key part of the governance of open science infrastructure, Samuel Moore underline that it is never precisely defined, which raised potential issues of under-representation of minority groups: