At the CODATA General Assembly of 27 and 28 October 2023, online and in Salzburg, the membership of CODATA elected Mercè Crosas as our new President. Mercè’s message to the CODATA community will appear here soon. In the meantime, here is her blog post as presidential candidate.
If elected President, I would bring to CODATA my three decades of experience working with management, sharing, and analysis of scientific data, and working across scientific disciplines (astrophysics, astronomy, genetics, biomedicine, social sciences) and across sectors (academia, government, and industry). As CODATA works across all domains and seeks to address interdisciplinary data challenges, this exceptionally wide experience will be valuable to the organization.
I have had a close relationship with scientific data from the early parts of my career until now, from a variety of perspectives. First, as a researcher and scientific software engineer in astrophysics and as a leader of software development for information and data management systems in biotechnologies. Later, in academia, co-leading the Dataverse.org open-source platform for sharing research data at Harvard´s Institute for Quantitative Social Science, among other projects on data analysis and data privacy (OpenDP.org), and leading research data management across Harvard University, and as a co-author of the FAIR principles and the Data Citation principles, and in the government of Catalonia, as Secretary of Open Government, responsible for open data, transparency, and civic participation. I am currently Head of Computational Social Sciences at the Barcelona Supercomputing Center.
CODATA has a unique position as a research data organization directly connected to the International Science Council (ISC) to provide directives and best practices for working with data across sciences and influence data policy across nations. This unique position needs to be taken responsibly and effectively. This is why, as a candidate for the presidency of CODATA, I want to emphasize being pragmatic, being collaborative, and being rigorous. What will this mean? I summarize below how each one of these values would translate to CODATA activities in the next four years:
Being Pragmatic
In the last decade, the field of research data sharing and management has contributed to three main advances: 1) expansion in the number of data repositories across practically all scientific fields, 2) a higher fraction of journals and funding organizations now encourage or require data sharing associated with the research results, and 3) a wide endorsement of the FAIR principles (for Findable, Accessible, Interoperable, ad Reusable data), with an emphasis on cross-domain metadata for interoperability and reuse. Despite these advances, there are not yet many examples of cross-disciplinary data sharing, merging, and reuse that could advance scientific knowledge or help address societal challenges. I propose a pragmatic approach driven less by tools, policies, or standards, but more by the research problems at hand. That is, an approach that would identify relevant scientific and/or societal problems that need to be solved, followed by construction of new datasets that combine existing data from multiple fields, harmonizing them as new, rich research resources. Furthermore, the pragmatism should be applied to make those datasets easily usable by software tools and algorithms, and important aspect of the FAIR principles. In this case, the approach should focus on building datasets that integrate automatically with at least two tools from two different scientific fields that are working on the same problem.
Being Collaborative
Science is increasingly collaborative (albeit competitive). The term ´team science´ is now used to define the widely cross-disciplinary approach to the teams that are being created to solve scientific problems with the increase of data and computation. These teams usually include subject experts, data scientists, computational scientists, data curators, among others. CODATA should be ready to foster this cross-disciplinary collaboration for a more comprehensive, efficient, and better-quality research, but should go even further in being collaborative with other organizations and sectors. These are four ways in which I would increase the collaboration from CODATA:
- Build strong ties with other research data organizations, mainly with the GO FAIR initiative, the Research Data Alliance, and the World Data System. This would mean meeting on a regular basis and continuing to define common projects, in which each organization can bring its unique strength, complement each other, and not duplicate efforts.
- Be mindful of cross-disciplinary approaches when we define data sharing and reuse projects to advance on working with data for research.
- Bring data centers and archives to work more closely with research computing and supercomputing centers. For many research projects, data need to be close to the computing, and the computing to the data. Even in cases in which there is a federation of data resources, often there is lack of a user-friendly integration from the data resources to the national computing centers.
- Collaborate across sectors. Research data are no longer (or perhaps have never been) created only for research and by researchers. There are vast amounts of data from industry and governments that can be very useful for research. CODATA has an influential, neutral, and broad position that can help to improve the data sharing among these three sectors: academia and/or research organizations, industry, and governments. In addition, it can foster the engagement of citizens to contribute data sharing for research.
- Continue working across borders and continents, with an emphasis on collaborating with areas with which CODATA has not had an opportunity to work closely, such as Latin America and other parts of the global south.
Being Rigorous
Science is the pursuit of truth. And to continue being so, it must continue aiming to be rigorous, unbiased, not driven by ideologies, open and verifiable. For this, research data must be accessible and of high-quality, and the analysis must be aware of the biases, errors, and uncertainties that might be drawn from not complete or not representative data. At the same time, access to and use of data must be done responsibly, following rigorous approaches, especially when data are sensitive or private. CODATA can help in these areas by: 1) exploring the standardization of levels and requirements of access to data, from completely open to increasingly restrictive to facilitate collaboration across groups and regions on private or sensitive data, 2) providing guidance for infrastructure as well as tools that help work with private or sensitive data responsibly but without losing data utility, and 3) promoting transparency of AI algorithms, scientific and statistical tools that process and analyze data so they can be validated by others, and 4) improve access to training and education in statistics, well-designed research, and applied qualitative, quantitative, and computational methods to improve the way we teach science.