Mission and objectives
The mission of the “Research Data Quality Management Across the Data Lifecycle” Task Group is to establish a conceptual and practical model for data quality and provide recommendations for assessing it across scientific domains to ensure trustworthy traceability throughout the entire data lifecycle. The group of domain experts contributes to CODATA’s Strategic Priorities by elaborating domain-specific standards to address cross-domain challenges, extending FAIR principles to include Data Quality Dimensions like accuracy, reliability or representativeness to an advance data policy, and developing frameworks to improve data quality processes in the context of automated AI & ML.
Significance
Data quality is an ‘underestimated pillar of research integrity’ that must be ensured as the global research ecosystem becomes more automated, data-driven and cross-disciplinary. Although individual scientific communities often develop assessment approaches tailored to their needs, the absence of shared validation frameworks and cross-disciplinary alignment of standards hinders the scalable and applicable data quality assurance. This can prevent misleading outcomes, particularly when training AI systems with inaccurate, corrupted, outdated or biased data. This task group start to enable more coherent, collaborative and higher-quality research across the entire data ecosystem.
Impact
Over the next two years, the Task Group aims to establish Data Quality is consistently understood, managed, and expressed in a formalized manner across disciplines. By delivering a conceptual model and a framework document based on existing standards and domain agreements the group will drive alignment and provide actionable recommendations for data producers and policymakers. Key achievements will include a landscape analysis of definitions, guidance on communicating quality metrics and a piloting phase to demonstrate how these metrics can be embedded in provenance information as necessity for accelerating data-driven value chains and improving data validation practices.
Planned (and later on actual) activities and outputs for 2025-2027
- Core Report and Framework, includes to deliver a landscape analysis of data quality definitions and standards, followed by a comparative analysis of relevant standards. These will culminate in a framework document and a conceptual model describing how to express data quality based on existing standards,
- Data Quality “Recpies” and actionable recommendations for data creators, publisher and user specifically focusing on incorporating data quality metrics into metadata and providing guidance on communicating these.
- Case Study as Piloting Phase is planned to demonstrate how cross-domain Data Quality approaches as proof of concepts against arbitrariness of “good quality data”
- Events and Community Engagement, the Tasks group will organise regular working meetings and set up community consultations to harmonize domain-specific standards, It is necessary and planned to have one face-to-face meeting, as expert workshop
- Collaboration with International Union and National CODATA Representatives via the CODATA Executive Director and the Secretariat, and participation in regular CODATA Members’ Meetings
- Issuing the final white paper as a peer-reviewed publication that will serve as a reference point for future studies and reports on data quality management
Contacts
Co-chairs:
- Chris Schubert, TU Wien, Austria, chris.schubert@tuwien.ac.at
- Kamil Dziubek, Universität Wien, Austria, kamil.dziubek@univie.ac.at