Open reproducible raw diffraction data for access in pandemics
Introduction
This case study spans raw diffraction data and specifically to link to the X-ray Diffraction raw data archive ‘XRDa’ at the Institute for Protein Research, Osaka University in Japan. XRDa is currently specifically tailored towards Asian depositors.
The scope is protein crystal structure diffraction experiments and analyses.
The quest to find medical treatments based on 3D structural data derived from protein crystal structure analysis has a long history. The current COVID-19 Pandemic, following vaccine treatments, can be supplemented by a drug treatment, or the use of both can of course be envisaged. Searches for lead compounds to drugs require as precise as possible protein molecular models and then protein with bound ligand crystal structures. Reproducibility of such data sets is paramount. Ideally, with the approach being followed here, there would be a single point of contact for definitive molecular models. Our approach would thereby seek to avoid multiple versions of a protein model derived from a single raw diffraction data set.
Significance of the Case Study
Drug discovery with protein crystallography has a long history which we can trace to such acclaimed researchers as Dr Max Perutz, Nobel Prize in Chemistry 1962 and whose book, for example, Protein Structure: New Approaches to Disease and Therapy summarises the way forward.
This Case study would provide:
- the point of contact for volunteer data reviewers
- Coordinator of community discussion to agree metrics of ‘definitive reusability’ of these raw diffraction data sets and ensuing single, definitive, protein models derived from them
- a direct communication to the Japanese initiative of the XRDa (X-ray Diffraction Archive) at the Institute of Protein Research, Osaka University for global researchers
Research Challenges and Requirements for GOSC
An open science cloud, with global reach, would assist to harness best practice and provide inter-community assistance in facilitation including addressing the possible bottlenecks and / or problematic situations that may arise. We mention, for instance, that Deep learning language translation could be a case where artificial intelligence is brought to bear to facilitate communications between participants with different first languages.
Engagement with the GOSC initiative
This case study would lead to single, definitive, protein models derived from their raw diffraction data sets, important to the crystallographic community and the broader research community for drug discovery for pandemic crises such as covid-19 poses for society. The considerable experience of the crystallographic community and our direct interaction with other CODATA GOSC volunteers would be brought to the fore and likewise best practice from other communities could be directly shared, compared and benefits no doubt will be obtained for all parties.
Possible deliverables
Reproducibility of data sets is paramount. With the approach being followed here, there would be a single point of contact for definitive molecular models.
Our approach seeks to avoid dispersed multiple versions of a protein model derived from a single raw diffraction data set. Controlled versioning procedure of PDB entries should be tightly linked.
A critical deliverable would be metrics of ‘definitive reusability’ which would then be applicable to the individual diffraction data sets held in the XRDa, the X-ray Diffraction Data Archive based at the Institute of Protein Research in Japan. These metrics and the definitive diffraction data files would lead on the policy interoperability and greatly assist platform and semantic interoperability. The overall reproducibility of the diffraction data and their linked molecular model would be the overarching guide. The scope of this challenge, in general terms, can be judged by the fact that the FAIR movement did not include data quality in its criteria. In the spirit of scientific reproducibility, we introduce a term somewhere between reusability and reproducibility, namely definitive reusability.
Case Study Co-chairs
John R Helliwell, University of Manchester, UK
Genji Kurisu, Osaka University, Japan
Secretariat Contact
Hana Pergl, CODATA
Additional participants are invited: sign-up here, if interested to join the Case Study.
GOSC Open Reproducible Raw Diffraction Data for Access in Pandemics Case Study Poster
Download the poster here