A Pressing Challenge for Global Science!
The major global scientific and human challenges of the 21st century (including climate change, sustainable development, and disaster risk reduction) can only be addressed through cross-domain research that seeks to understand complex systems through machine-assisted analysis at scale. Our capacity for such analysis is currently constrained by the limitations in our ability to access and combine heterogeneous data within and across domains. Sub-optimal data practices are a major and costly limiting factor on research: it is estimated that 80% of research expenditures are used to prepare inconsistent data for use.
Solutions to complex and difficult problems require data to be assessable and actionable by machines using big data in combination with the most advanced hardware and software technologies. Data must be richly described with metadata, well-documented, transparent and ultimately humanly comprehensible to facilitate extraction of meaning from complexity. The fundamental enabler of data-driven science is an ecosystem of resources that enable data to be FAIR (Findable, Accessible, Interoperable and Re-usable) for humans and machines. This ecosystem must include effective, maximally automated stewardship of data, and effective terminologies and metadata specifications.
An International Consensus-Building Initiative for Core Interoperability
Achieving this goal requires a major consensus-building effort, in particular to gain agreement about the core technologies and semantic solutions which will allow data to be combined for cross-domain research. CODATA, with the support of and on behalf of the International Science Council (ISC), proposes a major, global, decadal programme: ‘Making data work for cross-domain grand challenges’—to address these challenges in the period 2020 – 2030.
What Approach will the Decadal Programme Take?
From 2017 to 2019, a CODATA-led pilot project has developed, tested and refined methods for aligning metadata specifications, taxonomies and ontologies to address these problems in a consensual fashion. Applying this approach, the Decadal Programme will work with other international data organisations and their groups, with cross-domain programmes and research initiatives, with the organisations and communities that create metadata specifications and terminologies, and with other stakeholders to enable an ecosystem for FAIR data for cross domain research to be developed and implemented. This ecosystem will include a) maximally-automated stewardship of data; b) easily-applied terminologies and metadata specifications to assist the combination of data across domain boundaries; and, c) facilitation of the more effective application of data intensive, machine-learning and visualisation techniques for discovery and analysis.
The programme takes a three-pronged approach. Engagement and cross-fertilisation between these areas of activity is essential:
Theme 1: Enabling Technologies and Good Practice for Data-Intensive Science: working with domain and technology experts, the programme will identify and define enabling technologies and good practices for data intensive scientific discovery that is applicable across disciplines.
Theme 2: Mobilising Domains and Breaking Down Silos: the programme will pro-actively engage international scientific unions and associations, and other stakeholders, in programmes of work designed to promote interoperability of data and related services.
Theme 3: Advancing Interoperability Through Cross-Domain Case Studies: the programme will work with a number of cross domain case studies (including, but not limited to the Sustainable Development Goals, disaster risk reduction and reporting, urban health and resilience and infectious diseases).
What Impact will the Decadal Programme Have?
The impact of the programme will be to assist the scientific and innovation communities to accelerate scientific understanding through a step change in the application of interdisciplinary data-intensive methodologies and thereby enable more efficient and transparent science to address global challenges.
The programme will achieve this by helping reduce the proportion of effort dedicated to data cleaning, manipulation and harmonisation. It will maximise the amount of machine-actionable FAIR data available for analysis and linking.
The programme will play a catalysing role, prioritising its activity to enable and coordinate the development and adoption of standard technical specifications and vocabularies and associated technology approaches, while working with partners to ensure that the necessary capacity for their effective use exists, and has a tangible and growing impact on the data available.
Launching the Decadal Programme
Under the oversight of CODATA (and the programme governance), a coordinating programme office and a cohort of experts will play a facilitating role to build consensus and convergence. The development and implementation of technical and semantic solutions, where needed, will be done by programme nodes and partners.
CODATA is now charged to put in place the pilot activities, governance, core funding, capacity and partnerships for the Decadal Programme. We plan a formal launch at International Data Week in Seoul, Korea in June 2022.
As part of this process of partnership building, CODATA invites discussions with a range of institutions, cross-domain research projects, standards organisations and groups with expertise in data interoperability.
For further information, please contact Simon Hodson, CODATA Executive Director [simon(at)codata.org].