Terms of Reference:
This joint IUSSP/CODATA Working Group aimed to contribute to making data “Findable, Accessible, Interoperable, and Reusable” (FAIR) in the area of population research. Developed by the International Union for the Scientific Study of Population, in collaboration with Simon Hodson (Executive Director of the Committee on Data (CODATA) of the International Science Council), its target is the development of machine actionable vocabularies, which will vastly simplify the task of merging or combining information across data sets, i.e., to make them easily interoperable.
Programme:
Rationale and work envisioned
A growing movement aims to make data “Findable, Accessible, Interoperable, and Reusable” (FAIR). Population research is an empirically focussed field with a long tradition of widely shared, easily accessible data collections. The FAIR Principles point to ways that this tradition can be enhanced by taking advantage of emerging standards and technologies. This Working Group focused on the development of FAIR Vocabularies for demographic data, which is an essential step in making data reusable and interoperable.
FAIR vocabularies yield benefits when data from different sources must be combined. Consider the most basic variable in demographic analysis: age. OECD has a list of 643 age categories, and the UN Population Division copes with more than 1100 age groups. If the meanings of variables in a dataset are only available through human-readable documentation, like a pdf, harmonizing data from two providers will remain a tedious manual process. However, if the age categories are linked to persistent identifiers in machine actionable metadata, age groupings can be harmonized by software. If these operations are performed across dozens of variables in hundreds of data sources, enormous amounts of human time will be saved. As a consequence, combining information across data sets becomes significantly more feasible, greatly enhancing their comparability and reuse.
The joint IUSSP-CODATA Working Group built upon the work of the FAIR Vocabularies Group, who recently released “Ten Simple Rules for making a Vocabulary FAIR” (https://arxiv.org/abs/2012.02325). Most of their guidance is straightforward, like “Determine the governance arrangements and custodian responsible for the legacy vocabulary.” But some steps require specialized expertise in standards like Simple Knowledge Organisation System (SKOS) or the Web Ontology Language (OWL). In the longer term, FAIR vocabularies also need to be maintained, which requires sustainable institutions with the capacity to support necessary technologies. The Working Group sought advice from members of the FAIR Vocabularies Group and experts from other scientific domains to evaluate alternative strategies (e.g. centralized versus federated) and software.
The operational goal was to work with three to five partners in international organizations and academia to convert their existing vocabularies to FAIR principles. The group gave special attention to coordinating with existing initiatives, like the terminology repository supported by Statistical Data and Metadata eXchange (SDMX).
The ultimate goal of this initiative was to make demographic data more interoperable by publishing controlled vocabularies that can be found and acted upon by software. This has the potential to vastly reduce the costs of merging data from multiple sources for researchers seeking to use population data. The Working Group learned where additional technical development is needed and when community involvement through IUSSP and other organizations is beneficial.
The Working Group will began meeting in May 2021, and completed its work in 2023. The final report is available here: https://doi.org/10.5281/zenodo.7818157
Colleagues interested in learning more about this initiative should contact George Alter (FAIRvocab@iussp.org).