Author Archives: codata_blog

Disaster Risk Reduction and Open Data Newsletter: November and December 2025 Edition

Glaciers to Farms (G2F) Regional Program: Advancing Climate Resilience & Sustainable Development in Central and West Asia

Glacier systems in the Pamir and Tien Shan ranges are retreating rapidly; projections indicate over 50 per cent of glacier volume could be lost by 2100. This cryosphere meltdown is disrupting water cycles, threatening the livelihoods and food security of over 340 million people across nine countries. Without urgent intervention, communities risk worsening poverty, displacement, and instability.

The Glaciers to Farms (G2F) Programme responds to this crisis by integrating climate science into development planning and deploying concrete adaptation measures in glacier-dependent river basins. The programme focuses 100 per cent on adaptation and aligns with national priorities, including National Adaptation Plans (NAP), Nationally Determined Contributions (NDCs) and Sustainable Development Goals (SDGs).

Excessive heat harms young children’s development, study suggests

Climate change—including high temperatures and heat waves—has been shown to pose serious risks to the environment, food systems, and human health, but new research finds that it may also lead to delays in early childhood development.

Published in the Journal of Child Psychology and Psychiatry, the study found that children exposed to higher-than-usual temperatures—specifically, average maximum temperatures above 86 °F (30 °C)—were less likely to meet developmental milestones for literacy and numeracy, relative to children living in areas with lower temperatures.

Warm oceans seem to be turning even ‘weak’ cyclones into deadly rainmakers

The final week of November was devastating for several South Asian countries. Communities in Sri Lanka, Indonesia and Thailand were inundated as Cyclones Ditwah and Senyar unleashed days of relentless rain. Millions were affected, more than 1,500 people lost their lives, hundreds are still missing, and damages ran into multiple millions of US dollars. Sri Lanka’s president even described it as the most challenging natural disaster the island has ever seen.

Primed to burn: what’s behind the intense, sudden fires burning across New South Wales and Tasmania

Since the megafires of the 2019–20 summer, Australia has had multiple wet years. Vegetation has regrown strongly. In recent months, below-average rainfall has dried out many landscapes, resulting in dry fuels ready to burn. Prime Minister Anthony Albanese has warned these fires point to a “difficult” season ahead.

Do these fires mean Australia is facing another terrible fire season? Not necessarily. The growth of fuel is one thing. But a lot depends on the weather.

Five ways cities across Europe and Central Asia are adapting to extreme heat

Extreme heat is now one of the most urgent and fastest-growing climate risks across Europe and Central Asia. The summer of 2024 was the hottest ever recorded in Europe, while Central Asian cities are experiencing rising average temperatures, more frequent heatwaves and mounting stress on water, energy and health systems. Heat already causes more deaths than any other weather-related hazard, and urban growth is amplifying the risks.

Yet cities across the region are demonstrating that with the right tools, partnerships and planning, meaningful progress is possible.

Estimating Economic Resilience to Climate Impacts: The Gross Resilience Product

Climate impacts are reshaping economic realities across Africa. To understand what this means for future growth, the GCA Resilient Economies Index introduces an important new measure: the Gross Resilience Product (GRP). GRP identifies the share of a country’s GDP that remains resilient under projected climate impacts—offering a clearer picture of how climate hazards influence long-term development trajectories. Built using the Green Economy Model (GEM), it provides a quantitative, comparable estimate of climate-related risks to national economies.

From vulnerability to value: The economic payoff of adaptation in small island states

The economies and livelihoods of Small Island Developing States (SIDS) are particularly vulnerable to the impacts of climate change due to their geographic characteristics and economic structure, often dependent on tourism and trade. Critical infrastructure in SIDS is typically located near coasts or in areas at high risk of climate impacts. Countries such as the Maldives, with low-lying areas and roads, and hotels located in hazardous zones; and the Marshall Islands, which is at risk of being submerged due to rising sea levels, illustrate how SIDS are at risk of large economic damage from climate impacts.

Macroeconomic models can help create a clearer picture of the long-term implications of climate change and how adaptation action has the potential to create economic stability in SIDS.

Climate change and food prices

Food and climate change are closely linked. Food systems account for about one-quarter of all heat-trapping pollution. Meanwhile, extreme events fueled by climate change can damage crops, reduce yields, and disrupt supply chains — all of which can drive food prices higher. The availability, quality, and affordability of food reflect a complex set of climatic and socioeconomic factors. A recent study suggests that projected warming by 2035 would drive food price inflation in North America up by 1.4 to 1.8 percentage-points per-year on average. There are also examples of short-term price spikes in coffee, cocoa, California vegetables, and Florida oranges following exceptional heat, drought, and heavy rainfall in recent years.

Financing adaptation: 11 financial instruments that help build climate resilience

Climate adaptation has emerged as a high-return investment opportunity. A recent WRI analysis found that adaptation and resilience investments can unlock broad economic, social and environmental benefits that go far beyond simply avoiding losses, even when an extreme event doesn’t occur. The study — which evaluates the expected public benefits of 320 adaptation and resilience investments across agriculture, health, infrastructure and water — found that, on average, $1 invested in adaptation and resilience has the potential to generate more than $10 in benefits over 10 years.

Extreme Heat: The emerging science and its implications for Asia and the Pacific

This paper examines the potential for unprecedented heat events now and into the future and looks at how heat forecasting capabilities are evolving. It considers the compound effects when extreme heat combines with other hazards, and the socioeconomic and sector impacts of increased frequency and scale of extreme heat. The paper highlights how policymakers in Asia and the Pacific can harness this new knowledge to protect people from extreme heat using improved risk tools and warning systems, and heat-resilient public services and infrastructure.

Towards climate and extreme heat resilience: Lessons from African and Asian communities

The latest issue of Southasiadisasters.net brings together powerful insights from Africa and Asia on how communities are confronting climate change and extreme heat. From informal workers adapting to rising temperatures and girls’ education strengthening long-term resilience, to community-led early warnings in Tajikistan, agroforestry in Ghana, and heat-safe urban practices in India, the issue showcases practical, scalable solutions emerging from the ground up. Featuring contributions from 16 authors across the region, it highlights one message clearly: communities are not waiting—they are innovating, adapting, and leading the way toward a more climate-resilient future.

Mapping the impact and informing economic resilience

In response, this joint publication by the WMO and the UNDP presents a sectoral analysis of 91 weather, climate, and water-related Post-Disaster Needs Assessments (PDNAs) conducted between 2000 and 2024. The report highlights the socioeconomic impacts of hydrometeorological hazards, with recurring patterns of losses and damages observed in key sectors such as agriculture, housing, and transport. These sectors face both direct physical destruction and long-term disruptions to services, supply chains, and livelihoods.

The findings emphasize the need to move beyond reactive disaster response toward proactive, risk-informed development. Strengthening early warning systems, integrating National Meteorological and Hydrological Services, and improving access to standardized hazard-impact data are essential for effective preparedness and resilience investments.

Global Environment Outlook 7: A future we choose – Why investing in Earth now can lead to a trillion-dollar benefit for all

The report finds that investing in a stable climate, healthy nature and land, and a pollution-free planet can deliver trillions of dollars each year in additional global GDP, avoid millions of deaths, and lift hundreds of millions of people out of hunger and poverty in the coming decades.

Following current development pathways will bring catastrophic climate change, devastation to nature and biodiversity, debilitating land degradation and desertification, and lingering deadly pollution – all at a huge cost to people, planet and economies. Instead, the world can follow another, better path laid out in the report. This involves whole-of-society and whole-of-government approaches to transform the systems of economy and finance, materials and waste, energy, food and the environment – all backed by behavioural, social and cultural shifts that include respect for Indigenous Knowledge and Local Knowledge.

While there are up-front costs, the economic cost of inaction is much higher and the long-term return on investment of transformation is clear: the global macroeconomic benefits start to appear around 2050, grow to US$20 trillion per year by 2070 and rise thereafter.

Disaster risk management tipping points: impacts of extreme wildfire events and the resulting need for layered disaster risk management solutions

This paper depicts several limits of conventional wildfire risk management measures towards Extreme wildfire events (EWEs) and introduces the concept of disaster risk management tipping points (DRM TPs) as critical thresholds that necessitate a revised set of disaster risk management strategies. Recent research shows that the number of extreme fire events is increasing exponentially and events such as the most recent fires in Los Angeles in the U.S. (2025), the Hawaii fires (2023), Canada’s record-breaking fires (2023), the largest recorded fires in Greece-Europe (2023) and the 2025 European fire season underpin this observation.

Building on a bibliographic review, the authors depict the novelty of the concept and apply it to selected illustrative examples. They propose that this conceptualisation is useful when developing “layered” or diversified risk management approaches for different types of wildfire events including extremes. It may also leverage and shift the discussion around responsibilities in managing risk in terms of public versus individual contributions, the distribution of investments as well as related aspects of justice.

Highlights from the extreme heat and agriculture report

Extreme heat is one of the central, interconnected drivers of the climate crisis in agriculture. It threatens global food security and the livelihoods of billions. More than an independent hazard, it acts as a powerful risk multiplier, amplifying drought and triggering wildfires, leading to compound negative outcomes for crops, livestock, fisheries and aquaculture, and forests. These impacts directly endanger the health and productivity of agricultural workers, who are on the frontlines of this crisis. Effective adaptation opportunities exist, particularly through the use of predictable heat forecasts to enable effective risk management. However, these actions must be supported by interdisciplinary research and integrated risk governance. Building resilience is imperative, but ultimately there are profound limits to adaptation.

Severe floods significantly reduce global rice yields

This paper’s findings highlight the need for flood-resistant rice cultivars to mitigate risks and support global food security, and implement adaptation strategies against both flood and drought rice yield losses. Research suggests that as climate change accelerates, intensifying extreme weather events, the stability of future rice production is at considerable risk.

Although rice is a semi-aquatic crop that benefits from controlled inundation—such as irrigation or shallow flooding during early growth stages—uncontrolled or severe flood events can substantially reduce yields. Globally, the increasing frequency of climate-related disasters, such as droughts and floods, has placed substantial pressure on rice yields, historically leading to famine, supply shortages, and regional price volatility.

Climate finance landscape in arid and semi-arid counties of Kenya

The report identifies several opportunities and innovative approaches for enhancing climate finance in the arid and semi-arid counties, including leveraging international support, strengthening local institutions, promoting sustainable investment models, and adopting nature-based solutions. Recommendations are provided to address challenges hindering effective mobilization and utilization of climate finance, such as limited access to financial resources, weak institutional capacity, fragmented coordination, and uncertain policy environments.

Science House – World Economic Forum, Davos

Science House is a new global space putting transformative science at the center of the conversation, spotlighting innovations and solutions that will shape healthy lives on a healthy planet. Nobel laureates, pioneering researchers, policy-makers, business leaders, and philanthropists will convene for high-energy debates, showcases, and strategic roundtables.

The purpose? Translating discovery into lasting impact beyond Davos. Championing open science and fostering collaboration across disciplines and sectors, Science House is the platform where scientific knowledge can inform decisions that define the future.

Webinar on strengthening regional cooperation mechanisms for disaster preparedness and response in West Africa

This webinar will translate the ECOWAS Directorate of Humanitarian and Social Affairs (DHSA)–UNDRR partnership under the ECHO-funded project into action by operationalizing regional cooperation mechanisms that reinforce ECOWAS’s mandate for coordinated disaster response. It will engage stakeholders across government, civil society, academia, and communities to ensure inclusive and sustainable preparedness frameworks, while leveraging international expertise from the European Union Civil Protection Mechanism (UCPM), Copernicus Emergency Management Services, and the Emergency Response Coordination Centre (ERCC) to strengthen West Africa’s disaster resilience.

Date: 16 December 2025; Time: 10:00-14:00 GMT

Workshop on strengthening climate prediction capabilities within WMO Regional Climate Centres and focus countries in the Africa-Caribbean-Pacific regions

The training workshop aims to enable RCC and NMHS experts to provide high-quality climate services using state-of-the-art methodologies and tools. The workshop will comprise practical, hands-on and interactive sessions, including expert-led presentations on relevant tools and WMO approaches.

Date: 26 January 2026, 19:00 – 31 January 2026, 03:00: UTC: 26 January 2026, 06:00 – 30 January 2026, 14:00)

WCRP – Climate and cryosphere: open science conference 2026

The World Climate Research Programme is pleased to announce that the Climate and Cryosphere – Open Science Conference (CliC) will take place in Wellington, New Zealand from 9-12 February 2026.

The conference will include a diverse and cross-discipline town hall meeting, providing space for the international and interdisciplinary cryospheric community to make connections, discuss needs, find synergies, and plan meaningful action.

Marking 30 years of CliC, the Climate and Cryosphere Open Science Conference will contribute to the UN Decade of Action for Cryospheric Sciences (2025-2034) and prepare the community for the 5th International Polar Year (2032-2033).

Bridging Two Worlds: Reflections from the IDW2025 Panel on Research Data and Data Science

At IDW2025, a group of speakers from around the globe gathered to address a long-standing problem: although data is the common currency of both research data management and data science, the two communities often work in parallel worlds—each with its own conferences, training pipelines, infrastructures, and priorities. As Christine Kirkpatrick noted in her opening remarks, this separation persists despite converging challenges around stewardship, reproducibility, education, and ethics. She framed the session as an invitation to rethink how these domains might come together.

What followed was a set of short talks revealing just how interdependent—yet disconnected—these communities have become, and how much potential lies in more intentional collaboration.

From Observation to Interpretation: A Research Lifecycle View (Leo Lahti)

Leo Lahti opened with a fundamental question: How do we move from raw observation to meaningful interpretation in modern research? His answer traced the entire research lifecycle, positioning openness, interoperability, and transparency as essential ingredients. Drawing on studies that show how different choices in data preparation lead to drastically different results, Lahti made a compelling case for shared standards and methodological clarity.

His overarching argument: bridging data science and research data management is not merely technical, it is epistemic. It requires both communities to adopt shared infrastructures, shared educational foundations, and shared norms that elevate transparency as a scientific value.

The Human Infrastructure of Data (Daphne Raban)

Daphne Raban shifted the lens to data stewardship which she called a “bridge profession” sitting at the intersection of technology, governance, and human judgment. As data volumes grow and automated tools proliferate, she reminded us that stewardship is what keeps data meaningful, contextualized, and ethically sound.

Raban illustrated the diverse impact of stewards across healthcare, finance, government, and research institutions, grounding her argument in the data cycle perspective advanced through the Israeli national initiative on data science education. In her framing, stewardship is not just about compliance; it’s about building trustworthy, reusable data ecosystems sustained by communication, documentation, and collaboration.

Parallel Universes: Awareness Gaps in Data Education (Phil Bourne)

Phil Bourne then highlighted a striking and often overlooked fact: students in data science programs worldwide typically have no exposure to organizations like CODATA, RDA, or WDS. Meanwhile, those global data organizations often operate with limited awareness of the educational and research priorities of academic data science. These are, Bourne argued, parallel universes that urgently need bridges.

His proposed actions were concrete: connect student groups, align leadership networks, embed governance into data science curricula, and convene joint thematic workshops on AI, synthetic data, and data ethics. He framed data as a continuum – from production to engineering to analysis to societal impact – and argued that without collaboration across these steps, sustainability and trustworthiness will remain elusive.

Data Literacy for Everyone: K–12 and Community College Pathways (Padmanabhan Seshaiyer)

Padmanabhan Seshaiyer expanded the conversation into the educational pipeline, urging the community not to wait until university to introduce data literacy. He showcased innovative K–12 and community college bridge programs that pair culturally relevant pedagogy with inquiry-based learning grounded in the Data Cycle.

Students move from no-code tools to higher-code environments, tackling authentic problems—from geometry-based triangulation tasks to investigations of social issues such as bullying and community safety. These programs embed ethics, design thinking, social justice, and civic reasoning alongside technical skills.

Seshaiyer’s message was clear: building an equitable data future requires early, inclusive, and context-aware data education.

Embedding FAIR into Australia’s Climate Modelling Software (Kelsey Druken)

Finally, Kelsey Druken offered a concrete case study of integrating data stewardship directly into infrastructure within Australia’s national climate modelling system, ACCESS. Climate modelling, she reminded the audience, is intensely data-rich, but native model outputs often lack documentation, standards, and consistent metadata. FAIR compliance tends to happen afterward, manually, and inconsistently.

ACCESS-NRI is now working to embed FAIR principles inside the software workflows themselves. By developing versioned data specifications, harmonized naming conventions, controlled vocabularies, and comprehensive metadata at the point of production, they aim to ensure that FAIR becomes the default, not the afterthought.

Druken’s work powerfully illustrates what it looks like when practice and infrastructure finally align—a challenge raised repeatedly throughout the panel.

Across these five talks, several themes emerged:

  • Interoperability and transparency must be built into workflows—not bolted on later.
  • Education is the shared foundation, from K–12 to graduate programs to professional development.
  • Stewardship is central, not peripheral, to both science and data science.
  • Organizational silos hinder progress across global and academic communities.
  • A roadmap is needed, and the audience’s input will help shape one for future CODATA, RDA, and ADSA collaborations.

Possible next steps include forming a CODATA Task Group or RDA Interest Group, coordinating ecosystem tools and shared training resources, and proposing a companion session for the next ADSA meeting. Though there was not broad support for creating a new (interest or task) group, the people assembled were interested in further opportunities for continuing the conversation.

Amy Nurnberger (MIT), who attended the session, has already taken action following the IDW session. She and others have proposed a follow-on session at the upcoming Research Data Access and Preservation (RDAP) virtual conference to ensure the information science and library communities weigh in on bridging this divide.  

The Future of Data Depends on Us Learning to Work Together

If there was one message that resonated across the session, it was this: No single community can build the data ecosystem and the community needed; it requires data stewards, data scientists, educators, and the infrastructure providers working together.

Bridging the gaps between these worlds is not simply a matter of efficiency or coordination. It is a matter of scientific integrity, ethical responsibility, and global impact.

The panel made clear that the future of data – open, FAIR, ethical, and societally meaningful – will be built only when we stop treating research data and data science as parallel tracks and instead recognize them as parts of a shared, interdependent community.

November 2025: Publications in the Data Science Journal

Title: Digital Transformation in Materials Science: A User Journey of Nanoindentation, Image Analysis and Simulations
Author: Hanna Tsybenko, Sarath Menon, Fei Chen, Abril Azocar Guzman, Katharina Grünwald, Steffen Brinckmann, Tilmann Hickel, Tim Dahmen, Volker Hofmann, Stefan Sandfeld, Ruth Schwaiger
URL: http://doi.org/10.5334/dsj-2025-033
Title: How the DS-I Africa Consortium Is Harnessing the Power of Partnerships for Data Science in Africa
Author: Francis E. Agamah, Amit Mistry, Tino Muzambi, Gifty Dankyi, Laura Povlich, Michelle Skelton
URL: http://doi.org/10.5334/dsj-2025-032
Title: Evaluation of FAIR Principles in Indigenous Water-Climate-Environment (WCE) Data Repositories
Author: Parisa Sarzaeim, Grace Bulltail
URL: http://doi.org/10.5334/dsj-2025-031

Data Without Borders: Reflections from International Data Week 2025, Brisbane

By Bylhah Mugotitsa, Agnes Kiragga, Steve Cygu and Miranda Barasa, African Population and Health Research Center (APHRC).

If there was ever a place where coffee, curiosity, and code met perfectly, it was International Data Week 2025 in Brisbane. You could feel the excitement from the minute you walked into the Brisbane Convention and Exhibition Centre, the kind of energy that tells you something important is happening here. It wasn’t just about machine learning models or the latest frameworks; it was about people, relations, and future collaboration. From early-morning panels on open science to late-afternoon debates on ethical AI, every conversation carried the same heartbeat: data is no longer the property of machines; it is the language of humanity.

In our presentations and panel discussions by  African Population and Health Research Center (APHRC) (Agnes Kiragga’s exploration of building sustainable data science ecosystems in Africa, Miranda Barasa’s insights on co-designing data infrastructures for collaborative research through the Data Science Without Borders (DSWB), and Mugotitsa Bylhah’s reflections on longitudinal data harmonization and integration under the INSPIRE Mental Health project), we found ourselves returning to a shared conviction: data must always return home. It must speak back to the communities it comes from, in language they understand, so that it can truly serve.

The Philosophy of Data in a Changing World

Philosophers have long said that progress begins with understanding ourselves. In today’s world, that self-understanding is captured through data. Yet, as the Ghanaian philosopher Kwame Gyekye once said, “Knowledge must carry moral weight. It must serve community.” That is what Africa’s data movement is slowly but surely coming to embody: the moral dimension of measurement. From Nairobi to Dakar, Lusaka to Kampala, we are learning that data is not just about precision. It is about people, policy, and purpose. In Africa, we now promote data innovations developed by Africans, using African datasets and the broader African community.

At IDW2025, the sense of connection was real. Everyone seemed to recognize that the next chapter of data science will not be written in isolation. The conversations around “data without borders” were not just technical. It was about the idea that collaboration should transcend institutions, countries, and disciplines, allowing knowledge to move freely and ethically where it can do the most good. This aligned well with the purpose of the APHRC’s Data Science Without Borders Project (DSWB).

Our Work at APHRC: Building Cohesion Out of Complexity

In our session on ‘Operationalizing Ethical Data Governance in Africa: Applying CARE Principles and Developing of Data Sharing Agreements in African Institutions’, we shared how the INSPIRE Mental Health Initiative at the African Population and Health Research Center (APHRC) is harmonizing longitudinal health data across multiple African Health and Demographic Surveillance System (HDSS) sites. Using frameworks like the OMOP Common Data Model and DDI-Lifecycle metadata, we are building bridges across diverse datasets, from clinical notes to community surveys, to make them comparable, reusable, and meaningful for policymakers.

We also discussed some of the key challenges from early data mapping like inconsistent data quality checks and how proper investment in digital infrastructure and training can drive AI innovation. Yet, with AI increasingly shaping health, governance and economic systems across Africa, the need for conversations around context-appropriate AI governance frameworks emerges. It is painfully obvious that Africa simply importing models built elsewhere is not working, instead homegrown approaches grounded in local realities and societal norms are what is needed. Collaborative initiatives like DSWB show the potential of AI becoming a catalyst for inclusive, sustainable development when African institutions co-design solutions grounded in shared governance and ethical use.

Our DSWB project aims to build a new generation of African data scientists who can work collaboratively across borders, sharing tools, models, and ideas while respecting the sovereignty and privacy of national datasets. It’s a space where African researchers from Kenya, Uganda, Senegal, and South Africa come together to co-create solutions that work for their communities. Through DSWB, we have begun exploring federated learning, an approach that allows models to be trained across multiple datasets without moving sensitive data. This is a quiet revolution in the making. It means we can collaborate globally while keeping data local and secure -and maintaining data sovereignty.

Beyond the Numbers: The Human Moment

What made IDW 2025 truly special was both the advancements in data technology but also the people. The hallway chats, the coffee-break laughter, and the sense of shared curiosity were as valuable as any keynote. We found ourselves among a community that understood both the rigor and the heart behind the numbers. These moments reminded us that behind all the servers, scripts, and dashboards, the data community is really a people community. We might speak in lines of code, but what connects us is a simple truth: we care about the same world.

And then there were the lighter, unforgettable moments, the kind that conferences rarely put on the agenda but everyone remembers. One evening, we hopped onto the CityCat ferry for a boat ride down the Brisbane River to an event at Eat Street Northshore. By the time we docked, the air was filled with music, laughter, and the irresistible scent of global cuisines. Somewhere between sampling Thai noodles, Ethiopian coffee, and Australian doughnuts, the serious talk of algorithms gave way to dancing!!

There was something about those small, human moments that made Brisbane feel like home. IDW2025 reminded us that being “home away from home” doesn’t always mean missing where you came from; sometimes, it means discovering that belonging can happen anywhere people share purpose, curiosity, and good humor. For a few days, we weren’t just researchers and programmers; we were a floating global village of storytellers, dancers, and friends who just happened to love data.

Looking Ahead

As we return from Brisbane, the invitation is clear: It is important to strengthen the pipelines we have established, engage in meaningful engagement with communities before publicizing our findings, and take action based on our discoveries.

Preparations are now underway for the next IDW, which will take place in South Africa, where we hope to enhance debates with an even bigger representation of African voices, develop cross-institutional collaboration, and increase our impact.

In the end, what we carried home from IDW2025 was that in a world often divided by borders, data has given us a new kind of unity. A reminder that the pursuit of knowledge, when done ethically and inclusively, is a global act of humanity.

And that, perhaps, is the greatest lesson we carried home from Brisbane!!

Connections for a Sustainable Open-Science Future

Lili Zhang, Director of the Global Open Science Cloud International Programme Office (GOSC IPO), Computer Network Information Center, Chinese Academy of Sciences

On my first day in Brisbane for SciDataCon and International Data Week (IDW) 2025, I was captivated by local exhibits, especially the keyword ‘connection’ that stood out to me. ‘Connections bring people together and are vital in preserving cultural knowledge, strengthening identity, fostering belonging, and tethering past, present, and future generations.’ The same applies to open science infrastructures.

As a key pillar of Open Science, e-infrastructures connect advanced technologies with domain-specific solutions for robust development; they connect cloud services to the widest possible audience; and they tie digital revolutions to sovereignty, ensuring a fair and equitable future. As part of extended and ongoing collaborations and discussions, the Global Open Science Cloud initiative (GOSC) connects us with an IDW session titled ‘Open science actions toward achieving the SDGs: an infrastructure dialogue with the Global South’ on September 15th in Brisbane, Australia.

The UN 2030 Agenda has become a cornerstone of global science and policy efforts, demanding urgent, collective action to tackle poverty, inequality, climate change, and other systemic problems. The Global South remains at the forefront of the risks and opportunities within the SDGs framework. These regions face the most significant development challenges, but they also possess a wealth of untapped knowledge systems, scientific talents, and emerging infrastructures.

Co-organized by the Global Open Science Cloud International Programme Office (GOSC IPO), the session featured discussion on fostering a dialogue about how trusted, cross-national, and cross-regional e-infrastructures can promote science and innovation in line with the SDGs. Dr. Tshiamo Motshegwa, Director of the African Open Science Platform (AOSP), highlighted regional efforts in developing pan-African cyberinfrastructure that support the SDGs. He sparked conversations around the need and impetus for creating a federated and policy-aware research environment in Africa, including one that addresses the impacts of the SDGs. He highlighted the opportunity to advance these conversations at International Data Week (IDW) 2027, which will be hosted in South Africa. Dr. Motshegwa also shared African perspectives on managing data sovereignty and human rights. He highlighted that navigating data sovereignty and human rights is a complex, delicate task that often involves balancing and carefully managing related tensions. He also provided African examples of policy tools, including the AU AI Strategy, AU Data Policy Framework, and AU Science, Technology and Innovation Strategy for Africa (STISA) 2034, along with projects on the continent being used to promote open science.

 

Additionally, Dr. Huajin Wang from CNIC, CAS, introduced the Research Data Collaboration Network (RDCN) and CoNet as cross-domain data toolkits for open collaboration. RDCN is a novel reference architecture designed to facilitate seamless, end-to-end collaborative analysis. CoNet, meanwhile, is a production-level implementation of the RDCN architecture, deployed across the Chinese Academy of Sciences (CAS) data center system to enable complex, cross-domain scientific analysis. Dr. Rania Elsayed Ibrahim from the National Authority for Remote Sensing and Space Sciences in Egypt virtually shared insights on sustainable North-South collaboration driven by capacity development frameworks. For my part, I discussed GOSC’s efforts to develop advanced e-infrastructures for the Global South, supporting SDG research and capacity building. The following panel discussion broadened the conversation, with Dr. Pankaj Kumar of the International Geographical Union (IGU) stressing the importance of AI-focused training and the implementation of the SDGs. Dr. More Manda, Director of the National Integrated Cyberinfrastructure System (NICIS) from South Africa, stressed the urgent need to develop integrated solutions to boost investments in e-infrastructures for the Global South.

Building on ISOSC 2025, this GOSC IDW discussion deeply impressed me by addressing the SDGs challenges we face and affirming the importance of co-developing interoperable technical solutions for open science infrastructures. Acting as a bridge to connect trust across various sectors, we hope that future GOSC events will continue to promote Open Science and the SDGs. Let’s connect in the next IDW series to be hosted in South Africa in 2027!

More information is available as follows:

GOSC Initiative: https://codata.org/initiatives/making-data-work/global-open-science-cloud/
ISOSC 2025: https://www.cstcloud.net/news/200.jhtml
IDW 2025 GOSC session: https://www.cstcloud.net/news/203.jhtml

Building Participatory Data Infrastructures Across Disciplines and Geographies

Reflections on the ECR-organized session ‘Perspectives on Data Repositories across Disciplines, Geographies, and Cultures’ from CODATA Connect leads Pragya Chaube, Louis Mapatagane, and Cyrus Walther.

At International Data Week 2025, the Early Career Researcher (ECR) session, “Perspectives on Data Repositories Across Disciplines, Geographies, and Cultures,” was more than just a panel; it served as a quiet manifesto. It reminded us that data repositories are not storage systems; they are knowledge goods. And knowledge goods, by their very nature, depend on participation, inclusion, and shared stewardship.

The session presented a rare cross-section of real-world frictions: genomic data implicate sovereignty, particle-physics data that overwhelm infrastructure, human rights data requiring machine readability for accountability, disparities in climate science data, and open research data cultures where practice lags behind aspiration. The ECR panel converged on a single truth: if data are to serve humanity, then the design architecture of the next generation of repositories that hold them must overcome these frictions and reflect humanity’s diversity.

Sovereignty Before Structure

The scientific and infrastructural requirements of genomics highlights to us that repositories are never merely technical artefacts. But as Claire Rye reminded us through her work on genomic data in Aotearoa New Zealand, no repository is ethically sound if it fails to respect sovereignty. Te Tiriti o Waitangi (‘Treaty of Waitangi’ – founding document of New Zealand, signifies partnership between the British Crown and Māori) affirms that taonga Māori (full chieftainship of indigenous possessions), including genetic data, remain under Indigenous authority. This is not an afterthought to FAIR; it is the foundation of what the CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) principles demand. When we speak of openness, we must first ask: open for whom, and on whose terms? Without community control and benefit-sharing encoded in governance and metadata, openness risks becoming another form of extraction.

Scale Without Participation Reproduces Inequality

In high-energy physics, the problem is not a lack of collaboration or data but the asymmetry of participation. As Cyrus Walther illustrated, experiments such as the Large Hadron Collider (LHC) or Square Kilometer Array Observatory (SKAO) generate petabytes of data — but access to those data, and to the computational ecosystems required to analyse them, is not always given. Here lies a paradox: the most open scientific projects can still be closed in practice. Membership fees, specialised software, and critical know-how determine who gets to participate in knowledge production. If repositories are to be truly open, they must democratise capability. That means designing open collaboration structures such as Cherenkov Telescope Array Observatory (CTAO) and open methods that allow researchers from the Global South and early-career scientists to contribute meaningfully as co-authors of discovery.

Machine Readability as a Justice Technology

Adriana Bora’s Project AI against Modern Slavery (AIMS) demonstrated a different frontier: how data infrastructure can define accountability itself. By converting thousands of corporate modern slavery statements into machine-readable datasets, AIMS transforms what was once bureaucratic reporting into a living, searchable archive for justice. In this context, machine readability is not pedantic; it is ethical. A PDF is a barrier; a CSV serves as an invitation – — because when information is locked in static documents, it excludes most people from engaging with it, but when data is structured and open, it empowers anyone to analyse, question, and act on it. Metadata and semantics in line with the FAIR principles provide a guide and make the data easier to use consistently and programmatically. When governments require and resource standardised, machine-readable reporting, they shift the cost of use from the vulnerable to the powerful. That is what participatory infrastructure looks like: when format embodies ethics.

Regional Data as Global Equity

Louis Mapatagane’s call to action on climate data inequalities strikes at the core of global disparity. Most climate data reside in the Global North, while communities suffering the harshest effects remain invisible within global systems. Without locally produced and managed data, adaptation policies are based on borrowed evidence. His call for universities to host regional repositories grounded in FAIR + CARE principles redefines such repositories as civic institutions that connect research, education, and local resilience. In this perspective, data justice begins with agency, not access. Regional capacity forms the foundation.

Culture Eats Policy

Pragya Chaube’s reflections on open research data in India brought the conversation full circle: even the best global frameworks falter without supportive cultures. India is the world’s third-largest research producer, yet data-sharing remains low, with fewer than one in ten researchers depositing data in repositories. The barrier is not just technical, it is institutional and cultural. Many institutions lack the policies, incentives, and infrastructure that enable researchers to share confidently and be recognised for doing so. Mandates alone do not create a culture of sharing — incentives, literacy, and recognition do. When researchers are rewarded for curation, not just publication, repositories begin to serve their true purpose: collective advancement.

Toward Participatory Data Futures

The throughline across these talks is unmistakable: data repositories must evolve from being mere collections to become commons. By commons, we refer to data repositories conceived as community-governed public goods, enabling shared stewardship and equitable access for the common benefit. Their legitimacy will depend not on the terabytes they host but on the diversity of those empowered to use them. Participatory design is not an embellishment; it is the architecture of trust.

As we look towards International Data Week 2027 in South Africa, the challenge before us is clear: to shift from access as a mere slogan to the reality shared analytical power as a global norm. That is how we turn repositories into living knowledge goods, and knowledge goods into instruments of equity.

L to R: Louis Mapatgane, Adriana Bora, Pragya Chaube, Claire Rye, Cyrus Walther

Ode to LOI (Legal and Organisational Interoperability) – Data moves at the speed of trust

Reflections on Legal and Organisational Interoperability at IDW2025 from Matti Heikkurinen, LOI Lead and Project Portfolio Manager, CODATA

I’ll start with a confession: until quite recently, whenever I heard the word “interoperability”, I thought of the ability of any two systems to reproduce the same bit sequence across a connecting medium (wire, fibre, or airwaves). Formidable challenge, when you consider all the possible combinations of network vendors, device types, software stacks, noise, hardware failures, and human factors at play. But the engineering community has developed standards and processes to test technical interoperability through protocol analysis, plugtests, fault-tolerant software and hardware, among others. The blackouts are rare, worms usually stay in the cans, and Cloud outages shouldn’t cause insomnia (we hope). 

MoUSLA à la réalité

Travelling towards Brisbane, I had a high degree of trust in the technical and semantic interoperability of the aircraft components, backed by Byzantine Fault tolerance. I was less certain about finding pure algorithmic solutions or engineering approaches to provide complete solutions for the topics of the IDW 2025 session at the destination: Legal and Organisational Interoperability (LOI). This covers the ability of organisations not only to exchange information (Technical interoperability) and understand its meaning (Semantic interoperability), but also use the shared meaning to achieve mutually beneficial goals (Organisational interoperability) in a manner that defines and complies with the relevant legal basis for data sharing (Legal interoperability). 

Two randomly connected organisations are unlikely to reproduce the same policy decisions, given their unique governance models, legal frameworks, and standard operating procedures just because they can reproduce bit sequences sent by the other party. Nevertheless, organisational and legal interoperability are crucial to society, and mechanisms to achieve mutually beneficial goals have been developed. 

Memorandums of Understanding (MoUs) and Service Level Agreements (SLAs) have received significant attention as promising solutions for these challenges. And indeed, they do provide formal components to support and document Organisational and Legal interoperability. However, one of the motivations for the session was the (possibly controversial) view that, even if the agreements themselves were fully machine-actionable, they wouldn’t guarantee interoperability. They are the fiber-optic cables of LOI that need infrastructure around them to become operational; MoUs and SLAs are the desserts, not the whole meal.

CAREless whisper – Guilty machines have got no rhythm 

The opening plenary put any worries about the fit of our session topic to rest. The talks covered different aspects of indigenous data governance (presented in more detail in another blog), covering the different aspects of the CARE principles (Collective benefit, Authority to control, Responsibility, Ethics). The interesting weak signals in the session were the immediate pushback on governance models that were seen as too simplistic and the mutual benefits of adhering to CARE principles based on the SODA principles. The latter seemed important from the sustainable CARE point of view; ethics backed by enlightened self-interest tends to be much stronger LOI component than ethics alone.

The broader context of the LOI issues were covered by several other sessions and talks: the session on “Increasing Resilience of Global Earth and Environmental Science Data Supply Chains” brought new kinds of hazard scenarios into the spotlight, the RDA plenary session  “Balancing ethical considerations and Open Science principles when sharing community relevant data” presented the challenges in determining when sharing – or collecting – data might do more harm than good. 

“Our” session –  “Legal and organisational aspects of data interoperability: climate adaptation case studies” – included talks and discussions that were thought-provoking without exception. Some of the highlights included: 

  • O(N2) challenge of sharing environmental data in Australia: nine jurisdictions, multiple overlapping laws, inter-state data exchange requiring compliance with federal and different state legislations (Kheeran Dharmawardena, Australian Dataspaces). Kheeran’s presentation included a table comparing different regulations in the Australian context.
  • Legal and organisational interoperability in the ARDC People Thematic Data Research Commons (Adrian Burton, ARDC). Adrian described work in the ARDC People Commons and the wider OHDSI-OMOP community to establish specific terminologies for sharing agreements and consent.
  • Urban research combining high-value (including high commercial value) and sensitive datasets (Pascal Perez, Director, AURIN). Pascal emphasised the importance of detailed agreements among participating organisations that deal with specifics in a way proportionate to risk and consonant with benefits. 
  • Thanasis Sfetsos’s presentation of the Urban Heat Island use case of the Climate-Adapt4EOSC project, described the enablers for organisational and legal interoperability that have been identified in the initial analysis.
  • Supporting CARE principles using RAID and local contexts (Rebecca Farrington, Director of Research Data Systems, AuScope), including the use of specific metadata to allow the retrospective application of usage conditions.

The session could perhaps be summarised by Adrian Burton’s quote from the first day of the event: “Data travels at the speed of trust”.

LOI Interoperability – Fade to (shades of) grey?

Reading the above summary, one might wonder where the FAIR data community should focus its efforts amid the growing number of data-sharing scenarios that no longer clearly fall into the binary “allowed-forbidden” categories? Will the response to critical hazards be hampered by the need to put the benefits and risks on a scale with an arbitrary (and context-dependent) approval line? Should we just reinforce and expand ethics boards to cope with the growing challenges?

Personally, I would argue that growing awareness of the need to operate in this “grey area” makes it more important to develop semantic frameworks and tools to describe CARE issues. Humans undoubtedly need to stay in the loop, but robust, consistent methods for describing the issues are crucial for making decision-making efficient, consistent, and CAREful. The work of initiatives such as the Cross-Domain Interoperability Framework (CDIF) and Climate-Adapt4EOSC received a major boost at the IDW, with the idea of reviving the RDA-CODATA Legal Interoperability IG gaining considerable support. As a result, we hope to observe and report an increase in the speed of trust at IDW 2027! 

References

Evolving Roles for Data Scientists in the Age of Intelligent Automation

Reflections on Data Science and the role of the CODATA Data Science Journal from IDW and SciDataCon 2025, Brisbane, 15 October 2025, by Gita Yadav, DSJ Editorial Board, Matthew Mayernik, DSJ Editor-in-Chief, and Mark Parsons, past DSJ Editor-in-Chief.

What is the evolving role of data scientists in ensuring that automation and AI serve the long-term goals of science, integrity, and openness? 

This question shaped a lively and deeply reflective session at the 2025 International Data Week (IDW) in Brisbane, Australia, organised by members of the CODATA Data Science Journal (DSJ) editorial board together with partners from CODATA, the World Data System (WDS), and the International Science Council (ISC).

What made this session particularly rewarding was the high level of engagement from both speakers and audience participants, who actively examined practical pathways to support a new generation of research-aware, ethically grounded, and infrastructure-integrated data scientists. Their discussions demonstrated that the evolution of data science is not only a technical or institutional shift, but also a shared community project; one that redefines how knowledge, responsibility, and innovation intersect in the age of intelligent automation.

The session explored how data scientists must adapt to ethical, infrastructural, and community-centric expectations, and how organisations such as CODATA, WDS, ISC, and the Research Data Alliance (RDA) can collectively guide this evolution. Speakers reflected on the historical roots of data science while identifying forward-looking strategies for building capacity, governance, and trust in the expanding data ecosystem.

Background: Data Science as a Field in Motion!

The Data Science Journal was launched by CODATA in 2002 as a peer-reviewed venue to publish, share, and preserve knowledge on data-focused topics. As far as can be determined, it was the first scholarly journal to include the term “data science” in its title. In a retrospective essay, founding editor F. Jack Smith reflected that the title was initially contentious, with some members of the CODATA Publications Committee concerned that “data science” might be misunderstood. Ultimately, the committee agreed that “it was up to CODATA to ensure that it became understood” (Smith, 2023).

Over two decades later, that mission remains both prophetic and relevant. Data science has evolved dramatically, becoming central to academic research, policy, and industrial practice. The term itself has multiplied in meaning, encompassing everything from machine learning and analytics to data curation, visualization, and ethics. Today, as the volume, variety, and velocity of data continue to expand, the field faces another inflection point, while data veracity has become a critical fourth “V” in the landscape.

In an era of intelligent automation, large language models (LLMs) and ubiquitous sensing, data science has become not only a technical discipline but also a pillar of policy, infrastructure, and ethics. Yet, as the IDW2025 meeting made clear, the scientific community places distinct demands on data science, emphasizing provenance, traceability, transparency, and FAIR principles that are not always prioritized in commercial or governmental applications.
This tension raises an important question: What does it now mean to be a data scientist working in and for science, rather than for profit or production?

Mark Parson’s talk elucidated this tension very effectively, and Figure 1 is a screenshot of his half century plot depicting how the data ecosystem has continuously evolved from the establishment of the World Data Centres (1952) to the launch of the Data Science Journal (2002) and the creation of the Research Data Alliance (2013). Alongside these milestones, Mark also placed linguistic trends that reveal the rise of data science and data management and the gradual decline of information science, signalling a profound conceptual and cultural shift.

Figure 1:  Tracing the Arc of Data Science (1952–2022): This historical trajectory by Mark Parsons in his opening talk at the session illustrates how scientific data work has grown from information handling to knowledge stewardship. The Data Science Journal emerged as both a marker and a driver of this shift, grounding new technical vocabularies in the ethics and infrastructures of open science.

Insights from the Session

Evolution of Roles

Over the past three decades, data work has evolved from isolated roles (data managers, analysts, curators) into more integrated and interdisciplinary practice. In the 1990s, data management was often seen as a technical support activity. In the 2000s, as open data initiatives gained momentum, the need for stewardship became clearer: data were of little use unless structured, documented, and discoverable.

The 2010s witnessed large-scale infrastructure investments for sharing and integrating heterogeneous data. Now, in the 2020s, this infrastructure is being reconfigured in light of new AI/ML capabilities, emerging data governance policies, and community-driven data movements. Gitanjali Yadav highlighted this evolution in her talk,  emphasizing that scientific data work is inseparable from issues of responsibility and provenance (Figure 2). The emerging concept of semantic data science extends this arc, emphasizing multilingual, context-aware, and ethically informed practices that keep the human in the loop even as automation advances. 

Across all these phases, a recurring insight stands out: Automation can process data, but only humans can interpret meaning.

Figure 2: The Expanding Role of the Data Scientist by Gitanjali Yadav, reflecting on how the data scientist’s identity has evolved, from analyst (focused on computation and inference), to architect (building systems and workflows), to steward of integrity (ensuring ethics, transparency, and trust), making the present day data scientist a custodian of integrity and a community bridge.

Preservation and Provenance

A major theme during the session was preservation, not just of data, but of the expertise and communities that sustain it. Many repositories worldwide, especially smaller or domain-specific ones, face an existential crisis as funding cycles end or institutional support wanes. The panel emphasized that ownership and stewardship for such repositories should be distributed, balancing control with access, and ensuring that knowledge of the systems themselves is not lost.

Scientific data work depends on formal trust: verification through unambiguous reference to persistently accessible datasets. Provenance graphs, representing relationships between datasets, people, organizations, instruments, and software are central to this trust. They enable reproducibility, attribution, and understanding of how knowledge is constructed. Examples such as the #SemanticClimate initiative were cited (see references), showing how semantic annotation and linked data frameworks can connect disparate data sources and narratives in ethically transparent ways. Presented on behalf of Dr. Debasisa Mohanty, member of the CODATA National Committee in India, Figure 3 is a case study from India demonstrating how large-scale, ethically governed data ecosystems can transform biomedical science. This example underscored the dual role of data scientists as technical innovators and ethical stewards of sensitive information (Figure 3).

Figure 3. AI-Guided Genomic Discovery in India:  Screenshot of the Genome India project slide presented by Gitanjali Yadav on behalf of Debasisa Mohanty, depicting AI/ML-guided exploration of genomic diversity across India, discovering over 27 million rare variants, identifying population-specific drug-response markers, and constructing a genome-wide imputation panel that improves precision for Indian genotypes.

Ethics, Equity, and Access

Data derived from people raises enduring tensions between the ideal of openness and the necessity of privacy. Participants discussed methods such as anonymization and statistical obfuscation, which allow analysis without reidentification. A broader consensus emerged that equity precedes ethics: before one can speak of fairness or accountability in AI, data access itself must be equitable. Otherwise, ethical deliberations risk reinforcing existing asymmetries in who gets to produce, own, and use data.

Invisible Labour and Recognition

Several participants highlighted the invisible labour that underpins open science, particularly curation, cleaning, and metadata preparation. This work remains undervalued in traditional reward systems that prioritize publications and citations. As one discussant noted, Everyone celebrates analysis, but few remember who made the data analyzable.”

This invisibility also shapes the status of data professionals. Data scientists and stewards are often told, “you’re just the data person,” despite their central role in making science interoperable and reproducible. The Data Science Journal and CODATA recognise that much of the world’s data ecosystem is sustained by invisible or undervalued labour from data curation, annotation, and metadata stewardship to community-driven capacity building and open infrastructure maintenance. These contributions are essential to the integrity and longevity of scientific data but often fall outside traditional academic reward systems.

Moving forward, the discussion ranged around how the Data Science Journal can address these issues, for instance by:

  • Encouraging submissions that explicitly acknowledge data stewardship, curation, and technical maintenance as forms of scholarly contribution.

  • Promoting author contribution statements and citation practices that make invisible labour visible.

  • Partnering with CODATA and WDS initiatives to explore frameworks for credit attribution and recognition of data professionals and infrastructure teams.

  • Supporting cross-sector discussions on how open data infrastructures can better align with equity, inclusion, and fair recognition principles.

By doing so, the Data Science Journal aims to model a more just and transparent data culture,  one that values all contributors to the data lifecycle, not only those visible in authorship lines or algorithmic outputs.

Rethinking What “Data Science” Means

There was broad agreement that the term data science has become so expansive as to risk losing coherence. It now covers technical, social, and managerial dimensions. The panel argued for a renewed synthesis: data stewardship needs to become more data sciency, integrating computational and AI techniques, while data science must become more stewardship-aware, incorporating principles of provenance, ethics, and transparency.

Participants also noted the value of philosophical perspectives — as raised by a reference to Naomi Oreskes’ Why Trust Science? — in understanding how we know what we know. This epistemological reflection is increasingly vital as AI systems are trained on limited, often opaque, corpora of data.

Training as ‘Reciprocal Learning’

The conversation repeatedly returned to the challenge of training and upskilling. Many organizations are hiring data stewards from disciplinary backgrounds rather than data or information science programmes, and vice versa. The result is uneven literacy across both communities.

Some argued it is easier to train scientists in data management than to train data managers in domain science; others disagreed. What was clear is that curricula for both data science and data stewardship could benefit from closer integration,  potentially as a future CODATA initiative and that going forwards, we should use the term ‘reciprocal learning’ instead of ‘capacity building’. Embedding data stewards within research teams was highlighted as a successful model: it strengthens stewardship within science and builds new skills within the stewardship community itself.

In a forward looking stance, the Data Science Journal editors and CODATA executive team in the room, discussed with audience how we could commit to advancing visibility and equity in data science by championing reciprocal learning as a guiding principle for global data collaboration,  moving beyond the traditional notion of “capacity building,” which can imply a one-sided transfer of knowledge. Instead, we emphasise mutual exchange, contextual expertise, and co-creation between diverse data communities. We can also encourage publications and dialogues that highlight the human and infrastructural labour underpinning scientific data.

By embracing both the recognition of invisible labour and the ethos of reciprocal learning, the Data Science Journal seeks to nurture a more just, inclusive, and reflexive data culture, one that recognises data science as a community of practice rather than a hierarchy of expertise.

 

Global Parallels and Technological Impacts

While automation and AI are disrupting employment patterns worldwide, the panel noted that the challenges and opportunities faced by data professionals are strikingly similar across countries. Early-career researchers tend to adopt AI tools faster, while mid-career professionals face greater displacement risks. As history shows, new technologies often reinforce existing power structures, making it even more important to foreground inclusivity and human judgment in the automation age.

Data Scientists as Connectors

A unifying theme across the session was that data scientists serve as connectors,  linking disciplines, infrastructures, and communities. They mediate between automated systems and human interpretation, ensuring that AI augments rather than replaces expertise.

This connective role is both technical and social: it requires understanding algorithms, policies, and people. As automation accelerates, these skills, including empathy, interdisciplinarity, ethical awareness, will define the next generation of scientific data professionals.

Concluding Reflections

The panel concluded that the evolution of data science cannot be understood purely through the lens of tools or technologies. It must also be seen as a cultural and institutional transformation in how science organizes, values, and preserves its knowledge.

As automation, AI, and semantic infrastructures reshape the research landscape, the task of data scientists, and the mission of the CODATA Data Science Journal, remains clear: we aim to uphold transparency, equity, and openness in the systems that increasingly govern knowledge itself.

References

—–About the Session——

Session Title: Evolving Roles for Data Scientists in the Age of Intelligent Automation
Event: International Data Week (IDW) 2025SciDataCon 2025, Brisbane, Australia
Date: 24–27 October 2025
Organizers: Data Science Journal Editorial Board (CODATA, ISC, WDS)
Moderator: Dr. Gita Yadav (National Institute of Plant Genome Research, India)
Speakers:

  • Matt Mayernik – NSF National Center for Atmospheric Research, USA
  • Mark Parsons – Research Data Alliance / Arctic Data Community
  • Debasisa Mohanty – Director National Institute of Immunology, India (presentation by GY)
  • Gitanjali Yadav, National Institute of Plant Genome Research (NIPGR) India and #semanticClimate


Affiliated Organisations: CODATA, World Data System (WDS), International Science Council (ISC), Research Data Alliance (RDA)
Session Link: https://scidatacon.org/event/9/contributions/40/

Reimagining Data Futures – Reflections on Brisbane IDW: The CODATA International Data Policy Committee

Blog Post by Gitanjali Yadav, India

The hum of conversation this week at the sprawling and impressive Brisbane Convention & Exhibition Centre carried a familiar cadence; the sound of the global data community converging once again under the banner of International Data Week!  There’s something quietly transformative about being in a room where the world’s leading minds in data gather, not just to talk about datasets and infrastructures, but about what responsibility means in the age of digital abundance. That’s what it felt like at the IDW 2025: a coming together of scientists, policymakers, technologists, and communities, each holding a different piece of the world’s data puzzle. This year’s event drew 807 participants, with more than a hundred joining online from across continents;  a testament to how inclusive the global data community has become.

Across plenaries and working sessions, one thread remained unbroken: how we, as stewards of data, can embed responsibility, equity, and interoperability at the heart of global science :)

As a member of the CODATA International Data Policy Committee (IDPC), I have been in many such rooms before. But this week was somehow more grounded and more self-aware. Maybe it was the unmistakable presence of the First Nations voices that opened our discussions, or maybe it was the realization that data policy is no longer a backroom dialogue but center stage in how we imagine a fair and sustainable scientific future..

Beginning with CARE, and the Human Face of Data

The opening plenary, “CAREful Indigenous Data Governance,” set a powerful tone. Alfred Lin’s exploration of Taiwan’s Indigenous Peoples’ data practices, Niklas Labba’s Sami’ stories and Marcia Langton’s reflections on Australian Indigenous governance frameworks underscored how data justice is becoming a core tenet of global policy. 

For those of us in global data policy, this wasn’t just an inspiring beginning; it was a call to listen more carefully, to communities, to context, to the histories embedded in our datasets. The conversations that followed about the CARE and FAIR principles weren’t merely academic, but about dignity, reciprocity, and trust, primarily about placing humanity back into the architecture of data governance. 

From the CODATA and IDPC perspectives, these discussions reinforce our ongoing work on developing policy frameworks that operationalize CARE Principles alongside FAIR, ensuring inclusivity without compromising scientific rigor. The need to balance cultural sovereignty with open science ideals is no longer a conceptual debate; it’s a governance imperative.

From FAIR Maturity to Data Stewardship

The technical sessions that followed, covering metadata frameworks, persistent identifiers, and FAIR maturity models, jointly revealed how far the data world has come since these terms first entered our vocabulary. But beyond the acronyms and schemas, what struck us most was a subtle shift in language: from compliance to stewardship.The National PID Strategy case studies and the FAIR Data Maturity Model WG session both illustrated the policy dimension behind technical progress. They asked the hard questions, ranging from how national frameworks align with global infrastructures, to who holds accountability for cross-border interoperability?

As policy actors, we must ensure that these models translate into equitable access and sustained trust, not just technological alignment. Stewardship, after all, is not a technical act; it’s an ethical one that materialises when researchers, repositories, and policymakers see themselves as part of a living ecosystem of data care.

When AI Walks into the Room! 

Artificial intelligence hovered like a quiet undercurrent throughout the week, not as hype, but as a reckoning. The second plenary on Rigorous, Responsible, and Reproducible Science in the Era of FAIR Data and AI captured this policy challenge; governing machine learning and generative AI systems through the lens of openness and ethics. Presentations on FAIR for Machine Learning, AI-Ready Data Workflows, and Documenting LLM Interactions in Research showed a welcome move toward transparency. The session by the Artificial Intelligence and Data Visitation (AIDV) Working Group went further, connecting AI governance to international policy harmonization. 

This is a theme directly tied to CODATA’s strategic focus on AI policy interoperability!  We didn’t just celebrate what AI can do; we questioned what it should do, and how we might govern it.  CODATA’s role, and that of the IDPC, is to ensure that this understanding is shared, equitable, and globally informed.

Policy as Practice, Not Paper

By the time we reached Day 03, it was clear that “policy” is no longer a distant set of guidelines, but the pulse that runs through collaboration. The third day’s sessions from Policy and Practice of Data in Research to From Guidance to Practice: Implementing Open Science Policies in Crisis Situations, revealed the increasing role of research data policy as a bridge between disciplines, sectors, and nations. Hearing examples from health data infrastructures, crisis response networks, and university governance frameworks reminded me that policy, at its best, is lived and negotiated in real time, across borders and disciplines.

It also reminded us of how delicate this work is: good policy isn’t just written, but trusted: and that trust, once built, must be constantly renewed.

As the IDPC, we have long argued that policy coherence is essential to avoid fragmentation between AI regulation, Indigenous data principles, and open science standards. Seeing these conversations converge across RDA and CODATA sessions was both validating and urgent. The question is no longer whether data policy matters, but how fast we can make it responsive to evolving technologies and societal needs.

Towards a Global Commons for Data Policy: From Brisbane to the World

Across the plenaries and working group sessions, I kept returning to one idea: the vision of a policy commons for data.This recurring vision across sessions was unmistakable as a tangible thread that connects FAIR infrastructures, CARE governance, AI ethics, and SDG-aligned data frameworks into a coherent ecosystem. Initiatives like the Global Open Research Commons (GORC) and cross-continental collaborations on metadata standards reflect what such a commons could look like in practice: A shared space where ethical frameworks, AI governance, Indigenous sovereignty, and open science standards can coexist without hierarchy. 

It’s an ambitious dream,  but one that feels closer than ever.

A Subtle Shift

In the corridors between plenaries and coffee breaks, I sensed a subtle shift;  from technical to human, from compliance to care. Maybe that’s what this week was about, realizing that our modern data systems, no matter how complex, are still part of that same continuum of knowledge care! Policy, at its best, is about tending to that continuum and ensuring that as we build the future of data, we don’t forget its human roots. 

The conversations were about more than open data; they were about open hearts and about the humility it takes to govern wisely in an interconnected world. Leaving the southern hemisphere, I carry with me a renewed sense of purpose, that data policy, when done right, is not about control, but about connection. 

I am also reminded that policy work in data is as much about listening as legislating, about bringing voices from the periphery to the center, ensuring that our data futures are inclusive, ethical, and sustainable. That, perhaps, is the real legacy of International Data Week 2025!

Looking Ahead: Our shared Data Futures!

As the week closed, there was a sense of momentum; that the conversations begun in Brisbane would ripple outward. The Research Data Alliance (RDA) announced that its next plenary will be hosted at the Oval Cricket Ground in London, with the call for sessions opening in March 2026 and registrations from April 2026.

In the closing plenary, Mercè Crosas, President of CODATA, shared an evocative AI-generated “Data Word Cloud”;  a collective snapshot of the ideas that shaped International Data Week 2025. Created from the titles of every talk presented during the week, the cloud distills the essence of our shared conversations where Indigenous data governance, open science, FAIR principles, community empowerment, metadata standards, and responsible research practices emerged as themes, painting a portrait of what global data stewardship truly means today. This visual tapestry is a representation of collective language of the week, where words like Earth, CAREful, AI, Indigenous, FAIR, responsible and Community pulse among others. To me, it captures something more than any summary or report: the living vocabulary of a global data movement in motion! 

AI-generated Data Word Cloud presented by Mercè Crosas, President of CODATA, at the IDW 2025 Closing Plenary. Based on the titles of all conference talks.

Going forwards, the community is already looking to Cape Town, South Africa, where the next International Data Week will take place in September 2027. The symbolism of moving from Brisbane to the African continent is powerful; a reminder that the future of data must be as global, diverse, and connected as the challenges it seeks to address. 

I think back to the week’s first words; the acknowledgment of the traditional custodians of this land, their knowledge systems, and their deep relationship to data in its most ancient form: “Telling a Story!”

Dr. Gitanjali Yadav is a senior scientist at the National Institute of Plant Genome Research (NIPGR), India and the cofounder of #semanticClimate, a global citizen science movement for climate action. She also serves as co-chair of the International Data Policy Committee (IDPC) of CODATA and strongly advocates FAIR principles and Open Access. As an editor of CODATA’s Data Science Journal (DSJ), she chaired the IDW2025 session on “An evolving role of data scientists in the age of intelligent automation”. With CODATA India, Gita in working towards shaping the next generation of open, interoperable, and globally FAIR research assessment tools and welcomes participation from researchers, data scientists, policy experts, infrastructure developers, and early-career professionals interested in AI-ready, federated data access (data visitation) through Metadata, ontology, and semantic enrichment, as well as in reciprocal learning for equitable data ecosystems, specially across the Global South. Gita can be found on Linkedin, and contacted via email (gy@nipgr.ac.in). She will present her reflections on ‘Who Owns Our Knowledge’ at the New York Tech Libraries (NYIT) Open Access Week on October 24, 2025.