This is the seventeenth in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Mark Musen is a candidate for the CODATA Executive Committee as an Ordinary Member. He was nominated by the USA.
The essence of science is data, and “data science” is naturally central to science. Most scientists, however, view data science as the analysis of data—without consideration of where the data come from, how they are managed, and how they are communicated. CODATA consequently faces challenges educating the scientific community about the full spectrum of data science, and about the enormously important role that such an international organization can play in enhancing data infrastructure at a global scale.
I am honored to be nominated for a position on the CODATA executive committee, and I am excited about the opportunities to which I hope to be able to contribute. I am a senior faculty member at Stanford University, where I serve as Director of the Stanford Center for Biomedical Informatics Research. I am an M.D. who has deep understanding of clinical data management. I am a Ph.D. who has considerable experience in the management of laboratory data. My work is well respected. I am a member of the U.S. National Academy of Medicine and I have received two honorary doctoral degrees from European universities. I have served as a member of the U.S. National Committee for CODATA since 2021.
My entire career has focused on data. Early on, I studied how AI could be used to aid the design and execution of research protocols, improving the reproducibility of research and the completeness of data collection. I then led the development of what, after nearly four decades, is still the most widely used open-source technology for creating standard terminologies and scientific ontologies for data annotation (Protégé), and the most widely used open technology for archiving and disseminating such resources (BioPortal). BioPortal has become the foundation of a growing international consortium to create federated, discipline-specific repositories for terminological standards (the OntoPortal Alliance). My team has also created the CEDAR Workbench, which is increasingly used to author standards-adherent, descriptive metadata to ensure that datasets are FAIR.
Thus, although I am an academic, I am not satisfied teaching classes and publishing papers. I believe that it is essential to build tools and other infrastructure that people actually can use. Similarly, I believe that CODATA needs to do much more than to educate the global community about data science. CODATA needs to stimulate the development of new technologies and data standards that can enhance data stewardship and data sharing on a global basis—and thus enhance scholarship of all kinds in very pragmatic ways.
Although the creation of technology to ease the development and application of data and metadata standards is central to my professional work, I am sensitive to the notion that different communities have different requirements. Indeed, I believe that CODATA should play a role in working with a wide range of constituencies to help them to fashion their own discipline-specific approaches and standards. For example, I have worked with the VODAN project for FAIR data management in Africa and I was asked by the National Institutes of Health to guide its Tribal Data Repository initiative to study data-governance requirements among Indigenous peoples in the United States. I’ve thus come to appreciate first-hand many of the challenges of encouraging data sharing while ensuring appropriate data sovereignty and attention to the CARE principles.
CODATA is not just about “data.” CODATA touches nearly every aspect of research and scholarship, with the ability to influence best practices for data acquisition, data stewardship, data management, and data dissemination through training, standards, and technology. I have experience in all these areas, and I would enjoy the opportunity to build bridges across different scholarly communities, helping CODATA to advance research practices internationally through increasing attention to “data science” in the broadest sense.