Author Archives: codata_blog

Evolving Roles for Data Scientists in the Age of Intelligent Automation

Reflections on Data Science and the role of the CODATA Data Science Journal from IDW and SciDataCon 2025, Brisbane, 15 October 2025, by Gita Yadav, DSJ Editorial Board, Matthew Mayernik, DSJ Editor-in-Chief, and Mark Parsons, past DSJ Editor-in-Chief.

What is the evolving role of data scientists in ensuring that automation and AI serve the long-term goals of science, integrity, and openness? 

This question shaped a lively and deeply reflective session at the 2025 International Data Week (IDW) in Brisbane, Australia, organised by members of the CODATA Data Science Journal (DSJ) editorial board together with partners from CODATA, the World Data System (WDS), and the International Science Council (ISC).

What made this session particularly rewarding was the high level of engagement from both speakers and audience participants, who actively examined practical pathways to support a new generation of research-aware, ethically grounded, and infrastructure-integrated data scientists. Their discussions demonstrated that the evolution of data science is not only a technical or institutional shift, but also a shared community project; one that redefines how knowledge, responsibility, and innovation intersect in the age of intelligent automation.

The session explored how data scientists must adapt to ethical, infrastructural, and community-centric expectations, and how organisations such as CODATA, WDS, ISC, and the Research Data Alliance (RDA) can collectively guide this evolution. Speakers reflected on the historical roots of data science while identifying forward-looking strategies for building capacity, governance, and trust in the expanding data ecosystem.

Background: Data Science as a Field in Motion!

The Data Science Journal was launched by CODATA in 2002 as a peer-reviewed venue to publish, share, and preserve knowledge on data-focused topics. As far as can be determined, it was the first scholarly journal to include the term “data science” in its title. In a retrospective essay, founding editor F. Jack Smith reflected that the title was initially contentious, with some members of the CODATA Publications Committee concerned that “data science” might be misunderstood. Ultimately, the committee agreed that “it was up to CODATA to ensure that it became understood” (Smith, 2023).

Over two decades later, that mission remains both prophetic and relevant. Data science has evolved dramatically, becoming central to academic research, policy, and industrial practice. The term itself has multiplied in meaning, encompassing everything from machine learning and analytics to data curation, visualization, and ethics. Today, as the volume, variety, and velocity of data continue to expand, the field faces another inflection point, while data veracity has become a critical fourth “V” in the landscape.

In an era of intelligent automation, large language models (LLMs) and ubiquitous sensing, data science has become not only a technical discipline but also a pillar of policy, infrastructure, and ethics. Yet, as the IDW2025 meeting made clear, the scientific community places distinct demands on data science, emphasizing provenance, traceability, transparency, and FAIR principles that are not always prioritized in commercial or governmental applications.
This tension raises an important question: What does it now mean to be a data scientist working in and for science, rather than for profit or production?

Mark Parson’s talk elucidated this tension very effectively, and Figure 1 is a screenshot of his half century plot depicting how the data ecosystem has continuously evolved from the establishment of the World Data Centres (1952) to the launch of the Data Science Journal (2002) and the creation of the Research Data Alliance (2013). Alongside these milestones, Mark also placed linguistic trends that reveal the rise of data science and data management and the gradual decline of information science, signalling a profound conceptual and cultural shift.

Figure 1:  Tracing the Arc of Data Science (1952–2022): This historical trajectory by Mark Parsons in his opening talk at the session illustrates how scientific data work has grown from information handling to knowledge stewardship. The Data Science Journal emerged as both a marker and a driver of this shift, grounding new technical vocabularies in the ethics and infrastructures of open science.

Insights from the Session

Evolution of Roles

Over the past three decades, data work has evolved from isolated roles (data managers, analysts, curators) into more integrated and interdisciplinary practice. In the 1990s, data management was often seen as a technical support activity. In the 2000s, as open data initiatives gained momentum, the need for stewardship became clearer: data were of little use unless structured, documented, and discoverable.

The 2010s witnessed large-scale infrastructure investments for sharing and integrating heterogeneous data. Now, in the 2020s, this infrastructure is being reconfigured in light of new AI/ML capabilities, emerging data governance policies, and community-driven data movements. Gitanjali Yadav highlighted this evolution in her talk,  emphasizing that scientific data work is inseparable from issues of responsibility and provenance (Figure 2). The emerging concept of semantic data science extends this arc, emphasizing multilingual, context-aware, and ethically informed practices that keep the human in the loop even as automation advances. 

Across all these phases, a recurring insight stands out: Automation can process data, but only humans can interpret meaning.

Figure 2: The Expanding Role of the Data Scientist by Gitanjali Yadav, reflecting on how the data scientist’s identity has evolved, from analyst (focused on computation and inference), to architect (building systems and workflows), to steward of integrity (ensuring ethics, transparency, and trust), making the present day data scientist a custodian of integrity and a community bridge.

Preservation and Provenance

A major theme during the session was preservation, not just of data, but of the expertise and communities that sustain it. Many repositories worldwide, especially smaller or domain-specific ones, face an existential crisis as funding cycles end or institutional support wanes. The panel emphasized that ownership and stewardship for such repositories should be distributed, balancing control with access, and ensuring that knowledge of the systems themselves is not lost.

Scientific data work depends on formal trust: verification through unambiguous reference to persistently accessible datasets. Provenance graphs, representing relationships between datasets, people, organizations, instruments, and software are central to this trust. They enable reproducibility, attribution, and understanding of how knowledge is constructed. Examples such as the #SemanticClimate initiative were cited (see references), showing how semantic annotation and linked data frameworks can connect disparate data sources and narratives in ethically transparent ways. Presented on behalf of Dr. Debasisa Mohanty, member of the CODATA National Committee in India, Figure 3 is a case study from India demonstrating how large-scale, ethically governed data ecosystems can transform biomedical science. This example underscored the dual role of data scientists as technical innovators and ethical stewards of sensitive information (Figure 3).

Figure 3. AI-Guided Genomic Discovery in India:  Screenshot of the Genome India project slide presented by Gitanjali Yadav on behalf of Debasisa Mohanty, depicting AI/ML-guided exploration of genomic diversity across India, discovering over 27 million rare variants, identifying population-specific drug-response markers, and constructing a genome-wide imputation panel that improves precision for Indian genotypes.

Ethics, Equity, and Access

Data derived from people raises enduring tensions between the ideal of openness and the necessity of privacy. Participants discussed methods such as anonymization and statistical obfuscation, which allow analysis without reidentification. A broader consensus emerged that equity precedes ethics: before one can speak of fairness or accountability in AI, data access itself must be equitable. Otherwise, ethical deliberations risk reinforcing existing asymmetries in who gets to produce, own, and use data.

Invisible Labour and Recognition

Several participants highlighted the invisible labour that underpins open science, particularly curation, cleaning, and metadata preparation. This work remains undervalued in traditional reward systems that prioritize publications and citations. As one discussant noted, Everyone celebrates analysis, but few remember who made the data analyzable.”

This invisibility also shapes the status of data professionals. Data scientists and stewards are often told, “you’re just the data person,” despite their central role in making science interoperable and reproducible. The Data Science Journal and CODATA recognise that much of the world’s data ecosystem is sustained by invisible or undervalued labour from data curation, annotation, and metadata stewardship to community-driven capacity building and open infrastructure maintenance. These contributions are essential to the integrity and longevity of scientific data but often fall outside traditional academic reward systems.

Moving forward, the discussion ranged around how the Data Science Journal can address these issues, for instance by:

  • Encouraging submissions that explicitly acknowledge data stewardship, curation, and technical maintenance as forms of scholarly contribution.

  • Promoting author contribution statements and citation practices that make invisible labour visible.

  • Partnering with CODATA and WDS initiatives to explore frameworks for credit attribution and recognition of data professionals and infrastructure teams.

  • Supporting cross-sector discussions on how open data infrastructures can better align with equity, inclusion, and fair recognition principles.

By doing so, the Data Science Journal aims to model a more just and transparent data culture,  one that values all contributors to the data lifecycle, not only those visible in authorship lines or algorithmic outputs.

Rethinking What “Data Science” Means

There was broad agreement that the term data science has become so expansive as to risk losing coherence. It now covers technical, social, and managerial dimensions. The panel argued for a renewed synthesis: data stewardship needs to become more data sciency, integrating computational and AI techniques, while data science must become more stewardship-aware, incorporating principles of provenance, ethics, and transparency.

Participants also noted the value of philosophical perspectives — as raised by a reference to Naomi Oreskes’ Why Trust Science? — in understanding how we know what we know. This epistemological reflection is increasingly vital as AI systems are trained on limited, often opaque, corpora of data.

Training as ‘Reciprocal Learning’

The conversation repeatedly returned to the challenge of training and upskilling. Many organizations are hiring data stewards from disciplinary backgrounds rather than data or information science programmes, and vice versa. The result is uneven literacy across both communities.

Some argued it is easier to train scientists in data management than to train data managers in domain science; others disagreed. What was clear is that curricula for both data science and data stewardship could benefit from closer integration,  potentially as a future CODATA initiative and that going forwards, we should use the term ‘reciprocal learning’ instead of ‘capacity building’. Embedding data stewards within research teams was highlighted as a successful model: it strengthens stewardship within science and builds new skills within the stewardship community itself.

In a forward looking stance, the Data Science Journal editors and CODATA executive team in the room, discussed with audience how we could commit to advancing visibility and equity in data science by championing reciprocal learning as a guiding principle for global data collaboration,  moving beyond the traditional notion of “capacity building,” which can imply a one-sided transfer of knowledge. Instead, we emphasise mutual exchange, contextual expertise, and co-creation between diverse data communities. We can also encourage publications and dialogues that highlight the human and infrastructural labour underpinning scientific data.

By embracing both the recognition of invisible labour and the ethos of reciprocal learning, the Data Science Journal seeks to nurture a more just, inclusive, and reflexive data culture, one that recognises data science as a community of practice rather than a hierarchy of expertise.

 

Global Parallels and Technological Impacts

While automation and AI are disrupting employment patterns worldwide, the panel noted that the challenges and opportunities faced by data professionals are strikingly similar across countries. Early-career researchers tend to adopt AI tools faster, while mid-career professionals face greater displacement risks. As history shows, new technologies often reinforce existing power structures, making it even more important to foreground inclusivity and human judgment in the automation age.

Data Scientists as Connectors

A unifying theme across the session was that data scientists serve as connectors,  linking disciplines, infrastructures, and communities. They mediate between automated systems and human interpretation, ensuring that AI augments rather than replaces expertise.

This connective role is both technical and social: it requires understanding algorithms, policies, and people. As automation accelerates, these skills, including empathy, interdisciplinarity, ethical awareness, will define the next generation of scientific data professionals.

Concluding Reflections

The panel concluded that the evolution of data science cannot be understood purely through the lens of tools or technologies. It must also be seen as a cultural and institutional transformation in how science organizes, values, and preserves its knowledge.

As automation, AI, and semantic infrastructures reshape the research landscape, the task of data scientists, and the mission of the CODATA Data Science Journal, remains clear: we aim to uphold transparency, equity, and openness in the systems that increasingly govern knowledge itself.

References

—–About the Session——

Session Title: Evolving Roles for Data Scientists in the Age of Intelligent Automation
Event: International Data Week (IDW) 2025SciDataCon 2025, Brisbane, Australia
Date: 24–27 October 2025
Organizers: Data Science Journal Editorial Board (CODATA, ISC, WDS)
Moderator: Dr. Gita Yadav (National Institute of Plant Genome Research, India)
Speakers:

  • Matt Mayernik – NSF National Center for Atmospheric Research, USA
  • Mark Parsons – Research Data Alliance / Arctic Data Community
  • Debasisa Mohanty – Director National Institute of Immunology, India (presentation by GY)
  • Gitanjali Yadav, National Institute of Plant Genome Research (NIPGR) India and #semanticClimate


Affiliated Organisations: CODATA, World Data System (WDS), International Science Council (ISC), Research Data Alliance (RDA)
Session Link: https://scidatacon.org/event/9/contributions/40/

Reimagining Data Futures – Reflections on Brisbane IDW: The CODATA International Data Policy Committee

Blog Post by Gitanjali Yadav, India

The hum of conversation this week at the sprawling and impressive Brisbane Convention & Exhibition Centre carried a familiar cadence; the sound of the global data community converging once again under the banner of International Data Week!  There’s something quietly transformative about being in a room where the world’s leading minds in data gather, not just to talk about datasets and infrastructures, but about what responsibility means in the age of digital abundance. That’s what it felt like at the IDW 2025: a coming together of scientists, policymakers, technologists, and communities, each holding a different piece of the world’s data puzzle. This year’s event drew 807 participants, with more than a hundred joining online from across continents;  a testament to how inclusive the global data community has become.

Across plenaries and working sessions, one thread remained unbroken: how we, as stewards of data, can embed responsibility, equity, and interoperability at the heart of global science :)

As a member of the CODATA International Data Policy Committee (IDPC), I have been in many such rooms before. But this week was somehow more grounded and more self-aware. Maybe it was the unmistakable presence of the First Nations voices that opened our discussions, or maybe it was the realization that data policy is no longer a backroom dialogue but center stage in how we imagine a fair and sustainable scientific future..

Beginning with CARE, and the Human Face of Data

The opening plenary, “CAREful Indigenous Data Governance,” set a powerful tone. Alfred Lin’s exploration of Taiwan’s Indigenous Peoples’ data practices, Niklas Labba’s Sami’ stories and Marcia Langton’s reflections on Australian Indigenous governance frameworks underscored how data justice is becoming a core tenet of global policy. 

For those of us in global data policy, this wasn’t just an inspiring beginning; it was a call to listen more carefully, to communities, to context, to the histories embedded in our datasets. The conversations that followed about the CARE and FAIR principles weren’t merely academic, but about dignity, reciprocity, and trust, primarily about placing humanity back into the architecture of data governance. 

From the CODATA and IDPC perspectives, these discussions reinforce our ongoing work on developing policy frameworks that operationalize CARE Principles alongside FAIR, ensuring inclusivity without compromising scientific rigor. The need to balance cultural sovereignty with open science ideals is no longer a conceptual debate; it’s a governance imperative.

From FAIR Maturity to Data Stewardship

The technical sessions that followed, covering metadata frameworks, persistent identifiers, and FAIR maturity models, jointly revealed how far the data world has come since these terms first entered our vocabulary. But beyond the acronyms and schemas, what struck us most was a subtle shift in language: from compliance to stewardship.The National PID Strategy case studies and the FAIR Data Maturity Model WG session both illustrated the policy dimension behind technical progress. They asked the hard questions, ranging from how national frameworks align with global infrastructures, to who holds accountability for cross-border interoperability?

As policy actors, we must ensure that these models translate into equitable access and sustained trust, not just technological alignment. Stewardship, after all, is not a technical act; it’s an ethical one that materialises when researchers, repositories, and policymakers see themselves as part of a living ecosystem of data care.

When AI Walks into the Room! 

Artificial intelligence hovered like a quiet undercurrent throughout the week, not as hype, but as a reckoning. The second plenary on Rigorous, Responsible, and Reproducible Science in the Era of FAIR Data and AI captured this policy challenge; governing machine learning and generative AI systems through the lens of openness and ethics. Presentations on FAIR for Machine Learning, AI-Ready Data Workflows, and Documenting LLM Interactions in Research showed a welcome move toward transparency. The session by the Artificial Intelligence and Data Visitation (AIDV) Working Group went further, connecting AI governance to international policy harmonization. 

This is a theme directly tied to CODATA’s strategic focus on AI policy interoperability!  We didn’t just celebrate what AI can do; we questioned what it should do, and how we might govern it.  CODATA’s role, and that of the IDPC, is to ensure that this understanding is shared, equitable, and globally informed.

Policy as Practice, Not Paper

By the time we reached Day 03, it was clear that “policy” is no longer a distant set of guidelines, but the pulse that runs through collaboration. The third day’s sessions from Policy and Practice of Data in Research to From Guidance to Practice: Implementing Open Science Policies in Crisis Situations, revealed the increasing role of research data policy as a bridge between disciplines, sectors, and nations. Hearing examples from health data infrastructures, crisis response networks, and university governance frameworks reminded me that policy, at its best, is lived and negotiated in real time, across borders and disciplines.

It also reminded us of how delicate this work is: good policy isn’t just written, but trusted: and that trust, once built, must be constantly renewed.

As the IDPC, we have long argued that policy coherence is essential to avoid fragmentation between AI regulation, Indigenous data principles, and open science standards. Seeing these conversations converge across RDA and CODATA sessions was both validating and urgent. The question is no longer whether data policy matters, but how fast we can make it responsive to evolving technologies and societal needs.

Towards a Global Commons for Data Policy: From Brisbane to the World

Across the plenaries and working group sessions, I kept returning to one idea: the vision of a policy commons for data.This recurring vision across sessions was unmistakable as a tangible thread that connects FAIR infrastructures, CARE governance, AI ethics, and SDG-aligned data frameworks into a coherent ecosystem. Initiatives like the Global Open Research Commons (GORC) and cross-continental collaborations on metadata standards reflect what such a commons could look like in practice: A shared space where ethical frameworks, AI governance, Indigenous sovereignty, and open science standards can coexist without hierarchy. 

It’s an ambitious dream,  but one that feels closer than ever.

A Subtle Shift

In the corridors between plenaries and coffee breaks, I sensed a subtle shift;  from technical to human, from compliance to care. Maybe that’s what this week was about, realizing that our modern data systems, no matter how complex, are still part of that same continuum of knowledge care! Policy, at its best, is about tending to that continuum and ensuring that as we build the future of data, we don’t forget its human roots. 

The conversations were about more than open data; they were about open hearts and about the humility it takes to govern wisely in an interconnected world. Leaving the southern hemisphere, I carry with me a renewed sense of purpose, that data policy, when done right, is not about control, but about connection. 

I am also reminded that policy work in data is as much about listening as legislating, about bringing voices from the periphery to the center, ensuring that our data futures are inclusive, ethical, and sustainable. That, perhaps, is the real legacy of International Data Week 2025!

Looking Ahead: Our shared Data Futures!

As the week closed, there was a sense of momentum; that the conversations begun in Brisbane would ripple outward. The Research Data Alliance (RDA) announced that its next plenary will be hosted at the Oval Cricket Ground in London, with the call for sessions opening in March 2026 and registrations from April 2026.

In the closing plenary, Mercè Crosas, President of CODATA, shared an evocative AI-generated “Data Word Cloud”;  a collective snapshot of the ideas that shaped International Data Week 2025. Created from the titles of every talk presented during the week, the cloud distills the essence of our shared conversations where Indigenous data governance, open science, FAIR principles, community empowerment, metadata standards, and responsible research practices emerged as themes, painting a portrait of what global data stewardship truly means today. This visual tapestry is a representation of collective language of the week, where words like Earth, CAREful, AI, Indigenous, FAIR, responsible and Community pulse among others. To me, it captures something more than any summary or report: the living vocabulary of a global data movement in motion! 

AI-generated Data Word Cloud presented by Mercè Crosas, President of CODATA, at the IDW 2025 Closing Plenary. Based on the titles of all conference talks.

Going forwards, the community is already looking to Cape Town, South Africa, where the next International Data Week will take place in September 2027. The symbolism of moving from Brisbane to the African continent is powerful; a reminder that the future of data must be as global, diverse, and connected as the challenges it seeks to address. 

I think back to the week’s first words; the acknowledgment of the traditional custodians of this land, their knowledge systems, and their deep relationship to data in its most ancient form: “Telling a Story!”

Dr. Gitanjali Yadav is a senior scientist at the National Institute of Plant Genome Research (NIPGR), India and the cofounder of #semanticClimate, a global citizen science movement for climate action. She also serves as co-chair of the International Data Policy Committee (IDPC) of CODATA and strongly advocates FAIR principles and Open Access. As an editor of CODATA’s Data Science Journal (DSJ), she chaired the IDW2025 session on “An evolving role of data scientists in the age of intelligent automation”. With CODATA India, Gita in working towards shaping the next generation of open, interoperable, and globally FAIR research assessment tools and welcomes participation from researchers, data scientists, policy experts, infrastructure developers, and early-career professionals interested in AI-ready, federated data access (data visitation) through Metadata, ontology, and semantic enrichment, as well as in reciprocal learning for equitable data ecosystems, specially across the Global South. Gita can be found on Linkedin, and contacted via email (gy@nipgr.ac.in). She will present her reflections on ‘Who Owns Our Knowledge’ at the New York Tech Libraries (NYIT) Open Access Week on October 24, 2025.

When CARE Meets Reality (Reflections on Indigenous Data Governance at IDW 2025)

Blog Post by Gitanjali Yadav, India

The opening plenary on CAREful Indigenous Data Governance at the International Data Week 2025 in Brisbane promised to be one of the most important conversations of the week long biennial data conference, and it delivered! The session opened with a respectful acknowledgement of the Traditional Custodians of the land on which we met; the Thurrbal and Yaagera peoples. For me, on my first visit to the southern hemisphere, this was a particularly heartwarming moment as Prof. Tony Heymet, the Australian chief scientist and Dr. Margaret Sheil, Vice Chancellor of the Queensland University of Technology paid respects to Elders past and present, all Aboriginal and Islander peoples, honouring their enduring connection to knowledge, and to storytelling, the world’s oldest continuous data system, recognising how their wisdom continues to guide how we gather, share, and care for knowledge.

This opening plenary brought together global efforts to empower indigenous communities based on CARE foundational principles. For me, this session resonated very well with my own continuing reflections on “Who owns our knowledge?”, the theme of this year’s global ‘Open Access Week’, not only as a call to action but also as an inspiration for important conversations and collaborations around the globe. The answer can be found in the systems we build, the policies we uphold, and the communities we empower. Chaired with characteristic grace by Rosie Hicks, the session brought together three very distinct voices: Marcia Langton (Australia), Niklas Lábba (Sápmi, Sweden), and Alfred J. P. Lin (Taiwan) , each addressing how data, culture, and governance intersect in profoundly different ways. It was, in many ways, a masterclass in contrast! 

When Alfred J. P. Lin opened his talk with a background on Taiwan’s Indigenous Peoples (TIP), expectations were high. His talk, “How Scientific Computing, Data Science, and Open Science Revitalize and Empower Hard-to-Reach Populations” promised a 20-year perspective on digital empowerment. Yet to me, his presentation felt curiously detached from the communities it claimed to serve. The TIP data science framework was impressive in scope but seemed to prioritize system architecture over sovereignty. My concern is that even as we strive to foster self-determination, promote equitable partnerships, and address historical injustices, can scientific computing and open data ever truly empower Indigenous communities, or are we merely empowering institutions that have the infrastructure to process and publish those data? When I asked who owns the Indigenous knowledge generated in his projects; his research institute? the Taiwanese government? Or was it owned by the TIPs themselves, there was no clear answer. It was an unsettling moment. The question went to the heart of the session’s theme: CAREful governance requires clarity of authority and accountability. Without that, “empowerment” risks becoming just another research deliverable.

Still, this moment of discomfort was revealing. It reminded us that CARE, Collective Benefit, Authority to Control, Responsibility, and Ethics cannot be retrofitted onto systems that were never co-designed with communities in the first place. And this point was brought home beautifully by Niklas Lábba, whose talk “Knowledge, Culture and Business Data” was a revelation to me and easily among the most memorable talks of the entire week. Speaking as both a Sámi scholar and practitioner, he illustrated how cultural and business data are not opposing domains but co-dependent systems. His articulation of “data as kinship”, the idea that information can maintain or disrupt relationships, deeply resonated with me. Niklas didn’t just speak about data; he embodied a worldview in which data governance is a living, relational practice. His message was practical yet poetic: data governance that doesn’t nurture relationships will eventually fail. 

The final talk of the session by Marcia Langton was delivered with huge intellectual clarity and unmistakable authority, as the author spoke from the lived realities of Australian Indigenous communities and the long journey toward data sovereignty. I loved Marica, and her unapologetically political framing! She emphasized (and the audience agreed) that governance begins with recognition of power; who holds it, who enforces it, and who benefits from the flow of data. She reminded us that Indigenous data governance is not a technical framework but a political project, one that demands trust, time, and truth-telling. Her clarity on the stakes of sovereignty, that no principle of FAIR or CARE can substitute for Indigenous control, set a high bar for the entire conference.

Taken together, the #IDW2025 opening plenary offered a vivid panorama of where we stand in the global conversation on Indigenous data:

  • Langton grounded us in the politics of definition: that “governance” itself must be Indigenous-led.
  • Lábba showed us the poetry and pragmatism of data sovereignty lived every day.
  • Lin exposed, perhaps unintentionally, how far we still need to go when scientific ambition outpaces ethical reflection.

As a member of the CODATA International Data Policy Committee, I left the room convinced that our policy work must continue to challenge not just how data is shared, but who decides what sharing means. This session and its conversations reminded us all that policy isn’t about perfect frameworks; it’s about courage. The courage to ask uncomfortable questions, to hold silence when answers don’t come, and to keep insisting that “open” must never mean “unaccountable.” The future of data governance won’t be built in servers or standards: it will be built in trust.

Dr. Gitanjali Yadav is a senior scientist at the National Institute of Plant Genome Research (NIPGR), India and the cofounder of #semanticClimate, a global citizen science movement for climate action. She also serves as co-chair of the International Data Policy Committee (IDPC) of CODATA and strongly advocates FAIR principles and Open Access. As an editor of CODATA’s Data Science Journal (DSJ), she chaired the IDW2025 session on “An evolving role of data scientists in the age of intelligent automation”. With CODATA India, Gita in working towards shaping the next generation of open, interoperable, and globally FAIR research assessment tools and welcomes participation from researchers, data scientists, policy experts, infrastructure developers, and early-career professionals interested in AI-ready, federated data access (data visitation) through Metadata, ontology, and semantic enrichment, as well as in reciprocal learning for equitable data ecosystems, specially across the Global South. Gita can be found on Linkedin, and contacted via email (gy@nipgr.ac.in). She will present her reflections on ‘Who Owns Our Knowledge’ at the New York Tech Libraries (NYIT) Open Access Week on October 24, 2025.

Pankaj Kumar: Candidacy for CODATA Executive Committee Ordinary Member

This is the twenty-first in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Pankaj Kumar is a candidate for the CODATA Executive Committee as an Ordinary Member. He was nominated by the International Geographical Union.

Dear colleagues, 

I have the honor to present my candidature as an Ordinary Member to the CODATA executive committee. I am Dr. Pankaj Kumar, Associate Professor in the Department of Geography at the Delhi School of Economics, University of Delhi, India, and currently serving as Assistant Secretary General of the International Geographical Union (IGU). Previously, I served as Secretary of the IGU Commission on Biogeography and Biodiversity (2016–2020) and the Commission on Mountain Studies (2020–2024). Guided by the IGU Vision and Strategy 2024–2028, which emphasizes Open Science, I have been deeply engaged in advancing this global mission. I also contributed to the drafting of the IGU Statement on Academic Freedom and Ethics (IGU-SAFE) and led the IGU Executive Committee’s establishment of the GeoAI Task Force, reflecting my commitment to the ethical and innovative use of artificial intelligence in geospatial research.

My research focuses on climate vulnerability, disaster risk reduction, environmental sustainability, and mountain livelihoods. My work integrates open-access Earth Observation (EO) data, open-source software, cloud computing, and GeoAI for monitoring, mapping, and Prediction in Ungauged Basins (PUB). These initiatives directly contribute to Open Science and the Sustainable Development Goals (SDGs). In recognition of these contributions, I was honoured with the National Geospatial Faculty Fellow Award 2024 from IIT Bombay under the Ministry of Education’s FOSSEE initiative.

My association with CODATA has been long-standing and productive. Since 2021, I have served as IGU’s liaison to CODATA and, since 2023, as a member of the International Data Policy Committee (IDPC). My research aligns with the IDPC Topics for Action 2023–2025, particularly in ethics, AI policy, and data governance. Building on this collaboration, an IGU–CODATA joint panel titled “Operationalizing GeoAI for Sustainability: Data Governance, Data Visitation, and Spatial Decision Support Systems (SDSS)” has been proposed for the IGU Regional Conference 2026 in Istanbul, Türkiye.

Earlier, I co-organized the ICSU CODATA PASTD–IGU International Training Workshop on “Big Data for Science and Sustainability in Developing Countries” (Hyderabad, 2017) and participated in CODATA’s Beijing Workshop on “Scientific Big Data Sharing and Publication for Developing Countries.” I have also represented IGU and CODATA in multiple global forums, including the UN Geospatial Knowledge and Innovation Week 2024 (China), the 35th International Geographical Congress 2024 (Ireland), and the DBAR–CODATA Big Earth Data Session 2025 (China), International Symposium on Open Science Cloud (ISOSC), Suzhou, China—where I spoke on GeoAI, geoprivacy, and academic freedom in IGU.

These engagements reaffirm my belief that Open Science becomes meaningful only when coupled with capacity development and equitable access. I am dedicated to expanding CODATA’s outreach to social scientists, educators, and practitioners through mentorship and skills-building initiatives. IGU, under my coordination, is eager to strengthen collaboration with CODATA on data quality, reliability, ethics, and AI/GeoAI policy frameworks.

My candidacy is grounded in the conviction data are more than technical resources — they are tools for empowerment. The Data–Information–Knowledge–Wisdom (DIKW) framework plays a crucial role in understanding and addressing complex challenges. It highlights how raw data evolve into information, information into knowledge, and knowledge into wisdom, enabling informed decisions.

My active and long-term engagement in IGU commissions and experience as Assistant Secretary General of the International Geographical Union (IGU) have placed me in a position to shoulder new responsibilities as an Executive Committee Ordinary Member of CODATA. I am very keen to contribute to the mission of CODATA to connect data and people at international, national, regional and local level to advance data policy and open science, and to effectively contribute to global and local challenges of today’s increasingly digital societies.

Yasuyuki Minamiyama: Candidacy for CODATA Executive Committee Ordinary Member

This is the twenty-first in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Yasuyuki Minamiyama is a candidate for the CODATA Executive Committee as an Ordinary Member. He was nominated by Japan.

My expertise is interdisciplinary data curation methodologies, and I have conducted research on the sharing and reuse of research data. From 2019 to 2025, I participated in the Japanese national project for developing a national research data infrastructure and led research on the functional development of research data curation across various fields. I am currently working on upgrading the social survey data archives at the Institute of Social Science, University of Tokyo, Japan.

In CODATA, I was initially involved in data-related activities since 2014 at the National Institute of Polar Research, one of the WDCs. In the context of my expertise, I led the launch of the data journal “Polar Data Journal” in collaboration with the WDS Sub-Committee of Science Council Japan. This is the first Research institution-led data journal initiative in Japan and has published a total of 58 data papers as of June 2025. I was also involved in CODATA activities as a member of the RDA/CODATA Legal Interoperability IG, which was established in 2013. I joined the IG in late 2016, and I launched a WG to discuss legal aspects of research data in RDUF (Research Data Utilization Forum) in Japan. RDUF is a potential counterpart of the RDA in Japan. I led the RDUF WG to discuss the possibility of its implementation in Japan and conducted research for localization. Finally, the WG developed a set of guidelines for practitioners, incorporating the essence of this guideline and reflecting Japanese legal practices.

In Japan, I have worked with researchers working with data in the humanities and social sciences to develop guidelines for the proper handling of data in the JSPS-led ‘Programme for Constructing Data Infrastructure for the Humanities and Social Sciences’ in 2021. I have also involved to write policy recommendations on open science by the Science Council of Japan in 2022. Since 2024, as chair of the RDUF’s planning committee, I have been working with data curators from various fields – earth sciences, life sciences, materials science, humanities and social sciences – to form a cross-disciplinary community. 

I am also actively involved in international networking, such as the Research Data Alliance (RDA) and the Confederation of Open Access Repositories (COAR). My recent activities include being a co-proponent of a session on Research data stewardship in the Asia Pacific at SciDataCon 2025.

Drawing on these experiences, I would like to further strengthen the relationship by intermediating CODATA’s initiatives with common Japanese data-related activities. I believe Japan has the potential to significantly expand its contribution to CODATA’s diverse and wide-ranging initiatives. I intend to participate in CODATA’s initiatives from the perspective of my expertise in data curation, as well as actively connecting Japanese stakeholders with CODATA’s initiatives and helping to expand the community.

 

Audrey Masizana: Candidacy for CODATA Executive Committee Ordinary Member

This is the twentieth in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Audrey Masizana is a candidate for the CODATA Executive Committee as an Ordinary Member. She was nominated by Botswana.

Affiliation:

  • Institution: University of Botswana 
  • City: Gaborone
  • Country: Botswana
  • Role: Senior Scholar, University of Botswana

Nationality of Candidate:    Botswana

I, Audrey Masizana have been a member of Executive Committee of the CODATA since 2021. I hold a PhD in Computer Science from University of Manchester UK (2004). I am also a fellow of Botswana Academy of Sciences (FBAS) since 2021 and this year in 2025 have been elected as Vice President of the Academy. I also serve as Global Health International Scholar of University of Pennsylvania, USA (2023-2026).

I am a very passionate advocate of Open Data and Open Science Policies and Practices locally and across the continent and to that effect I currently chaired the development of the Botswana National Open Data Policy established by the Botswana Presidential Task Force (Smart Bots) in March 2021, which is now awaiting approval by Botswana Cabinet.

I have over the years gained enormous experience in spear heading academic networking platforms including  chairing conferences such as Information Technology for Development (IASTED Africa 20142016), International Conference in Cyber-Security and Information systems conference series referenced in ICICIS 2016 and here.  Also chairing the Local Organising Committees of the  International Data Week (IDW 2018)  in Botswana and the Viz Africa conferences  delivered in collaboration with CODATA, World Data Systems (WDS) and Research Data Alliance (RDA).

I also continue to support and contribute to the implementations of the African Open Science Platform (AOSP) having been a member of technical advisory committee while it was being established in 2017.  I strongly believe its delivery will improve on policy front across the continent.

I currently serve as a Project Investigator on behalf of University of Botswana on several interdisciplinary projects aligned to my research area in Scientific Application of Data.  These include 

My contribution to CODATA as an executive member has been largely on advocacy across the African continent on its mission and implementation strategy. This advocacy has led to the establishment of the newly formed Botswana CODATA National Committee which I am currently leading in developing its own strategy. I therefore wish to continue to serve to see to the end of this strategy and how it can be reused by other countries  who have yet to establish their national committee across the continent.

Lauren Maxwell: Candidacy for CODATA Executive Committee Ordinary Member

This is the nineteenth in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Lauren Maxwell is a candidate for the CODATA Executive Committee as an Ordinary Member. She was nominated by the Research Data Alliance.

My name is Lauren Maxwell, and I am an epidemiologist and mixed methods researcher. I am very thankful to have been nominated alongside such a fantastic group of data reuse advocates and innovators.

My work focuses on understanding and enabling data and sample reuse in the research response to emerging pathogens through investments in training and tooling for interoperability of data and metadata, understanding and addressing ethical, legal, social, and institutional (ELSI) barriers to data and sample reuse, and building better systems metadata to improve implementation decisions and enable federated, mulitmodal data reuse for health research and care data and across health data commons.

I lead the FAIR and equitable data and sample reuse research group at Universitätsklinikum Heidelberg in Germany and work packages or tasks in the EU-funded CONTAGIOCoMeCT and BE READY Pandemic Preparedness Consortia on behalf of Universitätsklinikum Heidelberg andEcraid Foundation in the Netherlands. I serve as an Ethics Review Committee member for the Human Cell Atlas and as a member of the EOSC Health Data Interoperability Task Force. During the COVID-19 pandemic, I led a series of workshops on behalf of the European Commission to build FAIR data for COVID-19 and supported work by the GloPID-R and CERCLE data reuse working groups. I have been engaged with CODATA since 2020, when I initiated the CODATA Health Data Working Group. I have continued that work with RDA as the co-chair of the jointCODATA-RDA Health Data Commons Working Group, where we are working to extend the Global Open Research Commons metadata model into the sensitive biomedical data domain to build the metadata needed to support a living map of implementation decisions across health data commons and between health data and commons in other domains. I also co-chair theTechnical Repository Service Providers Working Group with RDA where we are working to address barriers to repository certification to help researchers make sense of the rapidly evolving and consequential research data repository landscape. I was a co-author of the CODATA Cross Domain Interoperability Framework (CDIF) and am committed to supporting its implementation in the biomedical data space.

I am excited to have been nominated to the CODATA Executive Committee because of the need to link CODATA’s actions to support federated, cross-domain reuse of multimodal data and to build the linkages between humans, data, and humans and data to improvements in the pandemic preparedness landscape.  We don’t have time to clean up our act during a pandemic.  We need to do that beforehand, and we need to learn from domains outside of health, such as oceans, biodiversity, deep earth, and chemistry, as well as within health, including cancer, cardiology, rare diseases, and brain health, to get there.

We need to improve data and sample reuse for more effective detection and response to epidemics.  COVID-19 was both a failure and a triumph of data reuse. Fear around the consequences of data sharing was a key cause of the epidemic. Rapid global sharing of viral sequence data was a key enabler of resolving the pandemic. When we look at the production of research evidence in the wake of COVID-19, we can see how prior investments in the interoperability of health system data led to rapid, high-quality evidence production.  My goal is to work with CODATA to produce cross-domain, cross-infrastructure metadata to demonstrate how different approaches to data reuse enable or prevent value production for the different stakeholders in the funding-to-impact data lifecycle so we can drive informed data policies and investments for epidemic detection and response.

We need data sovereignty and value-driven approaches to incentivising investment in data quality and reuse. With full interoperability and high-quality data and metadata, including a functional PID system, we can reuse data where it sits through federated learning. As a privacy-by-design approach, federated data reuse empowers data holders by placing decision-making power in their hands, addressing some ELSI barriers to data reuse, and reducing both time and carbon footprint compared to centralised approaches. Data reuse has to drive value for data producers to be sustainable. We need to support efforts to build end-to-end value chains that enable local communities and nations to derive value from the data they produce to support federated reuse of high-quality, interoperable data at scale. My goal is to use the CODATA network to support efforts to connect individuals, data, and infrastructure investments to build transformative value for data producers through data interoperability and quality.

We need federated, actionable metadata to demonstrate and improve implementation decisions by the digital public infrastructures that support data reuse. Understanding how infrastructures and approaches, like interoperability-as-infrastructure, drive costs, cost savings, benefits and harms is central for improving cross-domain data reuse.  I want to work with the CODATA community to build the actionable, transparent, trusted metadata we need to create a learning system for understanding how data and sample reuse translates into impact and for whom to inform strategies for building equity and inclusion in data reuse.

We need cross-domain data reuse to address our shared, global challenges. Human health data is linked to data from every other domain we can think of.  Barriers to reusing climate dataprevent us from optimising malaria treatment and prevention efforts, a lack of data on plant health leads to missed opportunities to address the effects of aflatoxin exposure in agricultural workers. I hope to support CDIF’s implementation as a building block for Whole of (Global) Society systems metadata-driven approaches to understanding and addressing the cross-domain challenges that drive human health, including climate change, environmental exposures, and One Health.

Lastly, we need actionable tooling and training to support interoperability and inclusive, equitable approaches to biomedical and cross-domain data and sample reuse. I am committed to supporting CODATA’s efforts to train the next generation of data production, management, policymaking, and data governance experts and to develop the tooling they need to make their jobs easier.

Francisca Oladipo: Candidacy for CODATA Executive Committee Ordinary Member

This is the eighteenth in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Francisca Oladipo is a candidate for the CODATA Executive Committee as an Ordinary Member. She was nominated by the GO FAIR Foundation.

Professor Francisca Oladipo is Vice-Chancellor and Chief Executive Officer of Thomas Adewumi University, Kwara State, Nigeria, and Professor of Computer Science at Federal University Lokoja. She serves as Secretary-General of the Consortium of Universities in Kwara State (KU8+) and Vice-President and Board Secretary of VODAN (Value-driven Ownership of Data and Accessibility Network), a leading international network spanning Africa, Europe, and Asia dedicated to developing Afrocentric systems that ensure data sovereignty and ownership in residence.

Bridging the Global Data Divide: An African Perspective

I am honoured to stand for election to the CODATA Executive Committee at a pivotal moment for global data science. As CODATA’s mission emphasizes connecting data and people to advance science and improve our world, my candidacy represents a critical opportunity to ensure that Africa’s voice and the perspectives of the Global South are not merely present, but influential in shaping international data policy and practice.

The data revolution cannot truly be global if it does not include the majority of the world’s population. Africa is poised to host 60% of the world’s youth population by 2050. Hosting some of the most dynamic health, climate, and demographic datasets, the continent must move from being a data source to being a data leader. My work demonstrates that when we design data systems for African contexts, we create solutions the world adopts.

Proven Leadership in FAIR Data Implementation

Between 2020 and 2024, I was the Executive Coordinator of the African Implementation Network of the Virus Outbreak Data Network (VODAN-Africa), which one of the joint activities carried out by CODATA, RDA, WDS, and GO FAIR, where I led what became the world’s first and only successful implementation of machine-actionable FAIR Data Points in the context of the COVID-19 pandemic. This was not theoretical work, but my team actually demonstrated data visiting between Africa and Europe (Leiden) in 2020. Post-COVID, we deployed functional FAIR infrastructure across 88 health facilities in 8 African countries (Ethiopia, Kenya, Nigeria, Tunisia, Uganda, Zimbabwe, Somalia, Liberia) and the Netherlands (https://aun.mu.edu.et/vodan/). In 2024, VODAN Africa rebranded to Value-driven Ownership of Data and Accessibility Network with my appointment as Vice-President and Secretary of the Board, and in 2025, I was announced the CEO with further expansion into Tanzania, Somalia, Liberia, Burkina Faso, Namibia, Rwanda, China, Indonesia, and Kazakhstan.

This initiative, recognized in UNESCO’s 2021 Engineering Report and awarded “Most Inspiring Initiative” at Leiden Science Week in May 2022, demonstrates three critical points relevant to CODATA’s mission:

  1. FAIR principles can be successfully implemented in resource-constrained environments when designed with local contexts in mind
  2. African leadership in data infrastructure development produces globally relevant solutions
  3. Cross-border, cross-continental data collaboration is achievable when built on principles of equity and data sovereignty

Further Proven work in FAIR Data Leadership

In 2024 (22 – 26 January 2024), @Lorentz Center@Oort in Leiden University, The Netherlands, Barend Mons, Erik Schultes and I, organised “The Road to FAIR and Equitable Science” workshop. We brough together 55 experts and stakeholders to have a broad, international expert discussion regarding the impact of the FAIR principles (Lorentz, 2014) during the first decade of implementation and to collectively design a roadmap for the next decade. Between August 18-27 August of the same year, we brought over 30 African Scientists and Researchers to Leiden University Medical Centre for the 2024 LUMC Fair Data Training.

These achievements directly align with CODATA’s Strategic Plan 2023-2027 priorities, particularly “Making Data Work for Cross-Domain Grand Challenges” and advancing FAIR data practices for trustworthy, equitable, and transparent science.

Transformative Institutional Leadership

My leadership at Thomas Adewumi University, Nigeria demonstrates the transformative power of data-driven decision making in higher education. Between 2022 and 2025, we achieved:

  • 2,136% enrolment growth (from 66 to 2,676 students)
  • Dramatic improvement in global rankings (from #252 to #47 in Nigeria on Webometrics)
  • Over $500,000 in international research grants from Google, Dutch Ministry of Foreign Affairs, TETFund, and Philips Foundation
  • Establishment of 14 specialized research centers including AI-integrated learning ecosystems and FAIR data infrastructure

This transformation was built on the same principles that guide CODATA: making data work for institutional improvement, building data literacy and skills, and creating sustainable data ecosystems that serve broader societal goals.

Building Data Capacity Across Africa

Capacity building for Open Science and FAIR data is at the heart of CODATA’s mission. My work in this area spans multiple dimensions:

Curriculum Development: I have led the development of data stewardship curricula adopted across African institutions, directly contributing to CODATA’s priority of building capacity for trustworthy, equitable, and transparent science through improved data skills and education.

International Collaboration: As PhD advisor at Tilburg University’s Network for Globalization, Accessibility, Innovation and Care (GAIC), I promote partnerships between the Africa University Network for FAIR Open Science and European universities, fostering the South-South and South-North collaborations essential to CODATA’s global mission.

Training and Mentorship: Through my roles facilitating workshops, including hosting the Deep Learning IndabaX Africa, the ExploreCSR Series, The HERtificial Intelligence Bootcamps, Women in AI Sessions, the Pan-Africa Center for AI Ethics Summer School, and organizing multiple international conferences, I have directly trained hundreds of early-career researchers in data science and FAIR principles.

Global Recognition and Networks

My international profile and networks position me to effectively represent diverse perspectives on the CODATA Executive Committee:

  • Heidelberg Laureate Fellow (Germany, 2017)
  • US Department of State TechWomen Emerging Leader (Google, California, 2016)
  • 3× ACM Fairness, Accountability and Transparency Fellow (2019, 2020, 2024)
  • MIT PostDoctoral Fellow (Massachusetts Institute of Technology, 2014)
  • Multiple time recipient of the Emerging Scholar Award (only African recipient, Universitat Politècnica de València, Spain (2024), Universidad Abierta Interamericana, Buenos Aires (2024), National Changhua University of education, Taiwan (2025))
  • Faculty Scholar, Grace Hopper Celebration of Women in Computing
  • Grantee: Women in AI, Black in AI, Women of Colour in Computing, Widening Natural Language Processing
  • Fellow of Pan-African Scientific Research Council, African Scientists Institute, British Computer Society, and Nigeria Computer Society

These fellowships and my participation in global forums like IETF, Grace Hopper Celebration, Machine Learning Summer Schools, and UNESCO workshops have provided me deep understanding of international data science ecosystems and the critical importance of inclusive global data governance.

Research Excellence and Publication Record

With over 100 peer-reviewed publications and service on 10+ international journal editorial boards, my research contributions span the full spectrum of CODATA’s interests:

  • 25+ papers on FAIR data management and FAIR principles published in Data Intelligence and leading conferences
  • Pioneering work on machine learning, AI ethics, natural language processing for African languages
  • Studies on data quality, interoperability, and ethical AI which directly aligned with CODATA’s Data Ethics Task Group priorities
  • Research on curriculum development for data science education in emerging economies

What I Bring to CODATA

If elected to the Executive Committee, I will contribute:

  1. Authentic Global South Perspective

Not as a token voice, but as a proven leader who has successfully implemented international data initiatives in African contexts. I understand both the challenges and the immense opportunities of extending CODATA’s reach across diverse economic and technological landscapes.

  1. Practical Implementation Experience

Beyond policy documents and strategic plans, I have hands-on experience deploying FAIR infrastructure, building data literacy programs, and creating sustainable data ecosystems in resource-constrained environments. This practical knowledge is invaluable for CODATA’s mission to connect data and people.

  1. Bridge-Building Capacity

My roles spanning African, European, and North American institutions position me to facilitate the South-North and South-South collaborations essential to CODATA’s global mission. I can help ensure that CODATA’s strategic priorities resonate across diverse contexts.

  1. Focus on Equity and Inclusion

My work consistently emphasizes that Open Science and FAIR data must be truly open, not just technically accessible, but equitable, culturally appropriate, and respectful of data sovereignty. This aligns with CODATA’s commitment to trustworthy, equitable, and transparent science.

  1. Youth and Innovation Champion

As Vice-Chancellor of a rapidly growing university and mentor to hundreds of early-career researchers, I bring insights into how we can better engage the next generation of data scientists and ensure CODATA remains relevant to emerging leaders.

Vision for CODATA’s Future

CODATA stands at a critical juncture. The data challenges of our time (from pandemic response to climate action, from AI ethics to digital sovereignty) demand that we move beyond traditional power structures and genuinely globalize data governance.

I envision a CODATA that:

  • Actively works to decolonize data science by ensuring African and Global South innovations inform global standards, not just adopt them
  • Prioritizes implementation support for FAIR principles in diverse contexts, not just advocacy
  • Strengthens connections between CODATA’s strategic priorities and the UN Sustainable Development Goals
  • Builds robust South-South collaboration networks that complement North-South partnerships
  • Champions data sovereignty alongside data sharing, recognizing these as complementary rather than contradictory goals

Commitment to CODATA’s Mission

I am deeply committed to CODATA’s mission of connecting data and people to advance science and improve our world. My track record demonstrates that I do not just articulate vision, I deliver results. From the over 88 health facilities running FAIR Data Points across Africa, to the 2,676 students now studying at Thomas Adewumi University with AI-integrated curricula, to the hundreds of early-career researchers I have mentored, my work has consistently transformed aspiration into achievement.

The challenges facing global science require diverse perspectives, practical expertise, and proven leadership. I offer all three, grounded in a commitment to equity, excellence, and the transformative power of open, FAIR data.

I respectfully ask for your support in this election, not as a favour to African representation, but as an investment in CODATA’s future relevance and impact across all regions of our interconnected world.

Mark Musen: Candidacy for CODATA Executive Committee Ordinary Member

This is the seventeenth in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Mark Musen is a candidate for the CODATA Executive Committee as an Ordinary Member. He was nominated by the USA.

The essence of science is data, and “data science” is naturally central to science.  Most scientists, however, view data science as the analysis of data—without consideration of where the data come from, how they are managed, and how they are communicated.  CODATA consequently faces challenges educating the scientific community about the full spectrum of data science, and about the enormously important role that such an international organization can play in enhancing data infrastructure at a global scale.  

I am honored to be nominated for a position on the CODATA executive committee, and I am excited about the opportunities to which I hope to be able to contribute.  I am a senior faculty member at Stanford University, where I serve as Director of the Stanford Center for Biomedical Informatics Research.  I am an M.D. who has deep understanding of clinical data management.  I am a Ph.D. who has considerable experience in the management of laboratory data.  My work is well respected.  I am a member of the U.S. National Academy of Medicine and I have received two honorary doctoral degrees from European universities.  I have served as a member of the U.S. National Committee for CODATA since 2021.

My entire career has focused on data.  Early on, I studied how AI could be used to aid the design and execution of research protocols, improving the reproducibility of research and the completeness of data collection.  I then led the development of what, after nearly four decades, is still the most widely used open-source technology for creating standard terminologies and scientific ontologies for data annotation (Protégé), and the most widely used open technology for archiving and disseminating such resources (BioPortal).  BioPortal has become the foundation of a growing international consortium to create federated, discipline-specific repositories for terminological standards (the OntoPortal Alliance).  My team has also created the CEDAR Workbench, which is increasingly used to author standards-adherent, descriptive metadata to ensure that datasets are FAIR.

Thus, although I am an academic, I am not satisfied teaching classes and publishing papers. I believe that it is essential to build tools and other infrastructure that people actually can use.  Similarly, I believe that CODATA needs to do much more than to educate the global community about data science.  CODATA needs to stimulate the development of new technologies and data standards that can enhance data stewardship and data sharing on a global basis—and thus enhance scholarship of all kinds in very pragmatic ways.

Although the creation of technology to ease the development and application of data and metadata standards is central to my professional work, I am sensitive to the notion that different communities have different requirements.  Indeed, I believe that CODATA should play a role in working with a wide range of constituencies to help them to fashion their own discipline-specific approaches and standards.  For example, I have worked with the VODAN project for FAIR data management in Africa and I was asked by the National Institutes of Health to guide its Tribal Data Repository initiative to study data-governance requirements among Indigenous peoples in the United States.  I’ve thus come to appreciate first-hand many of the challenges of encouraging data sharing while ensuring appropriate data sovereignty and attention to the CARE principles. 

CODATA is not just about “data.”  CODATA touches nearly every aspect of research and scholarship, with the ability to influence best practices for data acquisition, data stewardship, data management, and data dissemination through training, standards, and technology.  I have experience in all these areas, and I would enjoy the opportunity to build bridges across different scholarly communities, helping CODATA to advance research practices internationally through increasing attention to “data science” in the broadest sense.

Rodrigo Roa: Candidacy for CODATA Executive Committee Ordinary Member

This is the sixteenth in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Rodrigo Roa is a candidate for the CODATA Executive Committee as an Ordinary Member. He was nominated by Chile.

Rodrigo Roa: Candidacy for CODATA Executive Committee

My professional path bridges law, science policy, and data governance. As Executive Director of the Data Observatory, a public–private foundation co-founded by the Government of Chile, Amazon Web Services (AWS), and Adolfo Ibáñez University, I lead initiatives that transform large and complex datasets into public value.

My work is grounded in building infrastructures that make data usable, trusted, and interoperable. Under my direction, the Data Observatory recently developed Chile’s National FAIR Data Policy, approved by the National Agency for Research and Development (ANID). To put these principles into practice, we created the SURDATA Alliance, a Latin American network for interoperability and data governance that connects public agencies, universities, and research centers.

I also serve on the Strategic Committee of LatamGPT, a large-language-model initiative led by the National Center for Artificial Intelligence (CENIA). There, we are developing regional datasets and computing infrastructure to ensure that artificial intelligence in Latin America reflects our languages, contexts, and values.

This year, I had the privilege of convening the first CODATA Committee in Latin America, a step that symbolizes our commitment to make global data collaboration multilingual and inclusive. Science and data know no borders, and language should never be a barrier. My goal is to help emerging economies implement FAIR principles, and to connect CODATA more closely with governments, academia, and industry across the region.

Beyond data policy, I’m also a professional drummer—a lifelong rock musician who believes rhythm is another form of connection. Music, like data, is a universal language that brings people together. I hope to bring a bit of that Latin American rhythm and collaborative spirit to CODATA’s global mission.

It would be an honour to serve on the Executive Committee, to represent Latin America’s growing data community, and to help CODATA’s work resonate from Santiago and beyond.