Select Page

CDIF Next Steps

With the completion of the WorldFAIR project in the summer of 2024, the initial draft of CDIF was released. It was made clear in the initial report that several areas remained to be addressed, and these will form the focus of further development. Under the banner of WorldFAIR+, a federated set of projects and case studies is now being assembled to help continue the work on the CDIF guidelines. Above all, a recent round of recruitment has been conducted, so that the team of experts in the CDIF Working and Advisory Groups is better able to develop the recommendations further.

The plans for future development cover the following areas:

  1. Extended Access
  2. Data Description / Mappings (for integration)
  3. Context / Provenance and Process
  4. Packaging and Formats
  5. AI Readiness
  6. Validation
  7. Communications and Advocacy

Extended Access

In the area of Access, there is a dependency on having shared vocabularies for describing rights and access conditions. While some work in this area is very useful — notably the Data Privacy Vocabulary (DPV) and the Data Use Ontology (DUO), these are not sufficient for full interoperability across a wide range of domain and institutional settings, and more work in this area will be conducted so that more specific recommendations can be made.

Data Description / Mappings (for integration)

For data description, the preparation of data for use in integration — making data “integration-ready” — has been covered, but the more-complete description of data integration itself, including semantic mappings and transformations — has not been fully described. This work is on-going. Standards such as SSSOM are being investigated, and alignment with the RDA group on FAIR Mappings is a stated goal.

Context / Provenance and Process

Provenance and data context have always formed a significant part of the work in CDIF, but to date no guidance in this has been provided. Some recent initial recommendations have been drafted, at the face-to-face workshop at Schloss Dagstuhl in October, but more work remains to be done. This topic covers not only the kinds of provenance described by the familiar PROV Ontology, but also extends to such topics as the description of experiments and studies, and the dependencies between variables within data sets as modelled in RDA’s I-ADOPT. Several groups have explored different aspects of provenance and context, and this work will be a major focus moving forward. It is expected that a simple provenance recommendation will be forthcoming soon, but that a more advanced set of recommendations will be produced in the longer term.

Packaging and Formats

Some new areas will also be coming into focus. One of these is the creation of packages — “FAIR Digital Objects” – which can be used to organize related sets of FAIR resources together. These are needed for various dissemination and archival functions. RO Crates are one of the approaches which will be considered here, as their use is becoming common. Another aspect of this work is the description of non-text-based data formats, such as NetCDF, Parquet, and HDF5. The metadata in these formats needs to be surfaced in a FAIR way so that they can be better utilized in cross-domain scenarios.

AI Readiness

Another new area of interest is the intersection of FAIR metadata and the use of data for training LLMs and generative AI. Work in conjunction with the ML Commons Croissant Working Group is being organized, and other aspects of this topic, such as “AI readiness” and “responsible AI” for metadata enrichment are being considered.

Validation

More specific guidance on the use of JSON-LD in exposing CDIF-compliant metadata is being drafted. This will be supported by the development of validation tools using SHACL, and an open-source development effort is being organized to help developers engage with the recommendations.

Communications and Advocacy

A further group will be formed, looking at how we can communicate with the broader audience for CDIF, and to track and showcase implementations. Several new projects under WorldFAIR+ and elsewhere are looking to employ the CDIF recommendations, and as these efforts mature they will contribute to the ongoing developments.