This joint IUSSP/CODATA Working Group aims to contribute to making data “Findable, Accessible, Interoperable, and Reusable” (FAIR) in the area of population research. Developed in collaboration with Simon Hodson (Executive Director of the Committee on Data (CODATA) of the International Science Council), its target is the development of machine actionable vocabularies, which will vastly simplify the task of merging or combining information across data sets, i.e., to make them easily interoperable.
Rationale and work envisioned
A growing movement aims to make data “Findable, Accessible, Interoperable, and Reusable” (FAIR). Population research is an empirically focussed field with a long tradition of widely shared, easily accessible data collections. The FAIR Principles point to ways that this tradition can be enhanced by taking advantage of emerging standards and technologies. This Working Group will focus on the development of FAIR Vocabularies for demographic data, which is an essential step in making data reusable and interoperable.
FAIR vocabularies yield benefits when data from different sources must be combined. Consider the most basic variable in demographic analysis: age. OECD has a list of 643 age categories, and the UN Population Division copes with more than 1100 age groups. If the meanings of variables in a dataset are only available through human-readable documentation, like a pdf, harmonizing data from two providers will remain a tedious manual process. However, if the age categories are linked to persistent identifiers in machine actionable metadata, age groupings can be harmonized by software. If these operations are performed across dozens of variables in hundreds of data sources, enormous amounts of human time will be saved. As a consequence, combining information across data sets becomes significantly more feasible, greatly enhancing their comparability and reuse.
The joint IUSSP-CODATA Working Group will build upon the work of the FAIR Vocabularies Group, who recently released “Ten Simple Rules for making a vocabulary FAIR” (https://arxiv.org/abs/2012.02325). Most of their guidance is straightforward, like “Determine the governance arrangements and custodian responsible for the legacy vocabulary.” But some steps require specialized expertise in standards like Simple Knowledge Organisation System (SKOS) or the Web Ontology Language (OWL). In the longer term, FAIR vocabularies also need to be maintained, which requires sustainable institutions with the capacity to support necessary technologies. The Working Group will seek advice from members of the FAIR Vocabularies Group and experts from other scientific domains to evaluate alternative strategies (e.g. centralized versus federated) and software.
The operational goal will be to work with three to five partners in international organizations and academia to convert their existing vocabularies to FAIR principles. The group will give special attention to coordinating with existing initiatives, like the terminology repository supported by Statistical Data and Metadata eXchange (SDMX).
The ultimate goal of this initiative is to make demographic data more interoperable by publishing controlled vocabularies that can be found and acted upon by software. This has the potential to vastly reduce the costs of merging data from multiple sources for researchers seeking to use population data. The Working Group will learn where additional technical development is needed and when community involvement through IUSSP and other organizations is beneficial. A two-year work plan is envisioned.
The Working Group will begin meeting in May 2021, and it is expected to complete its work in two years.
Additional members will be added to the Working Group in March and April 2021. Colleagues interested in learning more about this new initiative or participating in the work of this Working Group should contact George Alter (FAIRvocab@iussp.org).