Monthly Archives: November 2025

Data Without Borders: Reflections from International Data Week 2025, Brisbane

By Bylhah Mugotitsa, Agnes Kiragga, Steve Cygu and Miranda Barasa, African Population and Health Research Center (APHRC).

If there was ever a place where coffee, curiosity, and code met perfectly, it was International Data Week 2025 in Brisbane. You could feel the excitement from the minute you walked into the Brisbane Convention and Exhibition Centre, the kind of energy that tells you something important is happening here. It wasn’t just about machine learning models or the latest frameworks; it was about people, relations, and future collaboration. From early-morning panels on open science to late-afternoon debates on ethical AI, every conversation carried the same heartbeat: data is no longer the property of machines; it is the language of humanity.

In our presentations and panel discussions by African Population and Health Research Center (APHRC) (Agnes Kiragga’s exploration of building sustainable data science ecosystems in Africa, Miranda Barasa’s insights on co-designing data infrastructures for collaborative research through the Data Science Without Borders (DSWB), and Mugotitsa Bylhah’s reflections on longitudinal data harmonization and integration under the INSPIRE Mental Health project), we found ourselves returning to a shared conviction: data must always return home. It must speak back to the communities it comes from, in language they understand, so that it can truly serve.

The Philosophy of Data in a Changing World

Philosophers have long said that progress begins with understanding ourselves. In today’s world, that self-understanding is captured through data. Yet, as the Ghanaian philosopher Kwame Gyekye once said, “Knowledge must carry moral weight. It must serve community.” That is what Africa’s data movement is slowly but surely coming to embody: the moral dimension of measurement. From Nairobi to Dakar, Lusaka to Kampala, we are learning that data is not just about precision. It is about people, policy, and purpose. In Africa, we now promote data innovations developed by Africans, using African datasets and the broader African community.

At IDW2025, the sense of connection was real. Everyone seemed to recognize that the next chapter of data science will not be written in isolation. The conversations around “data without borders” were not just technical. It was about the idea that collaboration should transcend institutions, countries, and disciplines, allowing knowledge to move freely and ethically where it can do the most good. This aligned well with the purpose of the APHRC’s Data Science Without Borders Project (DSWB).

Our Work at APHRC: Building Cohesion Out of Complexity

In our session on ‘Operationalizing Ethical Data Governance in Africa: Applying CARE Principles and Developing of Data Sharing Agreements in African Institutions’, we shared how the INSPIRE Mental Health Initiative at the African Population and Health Research Center (APHRC) is harmonizing longitudinal health data across multiple African Health and Demographic Surveillance System (HDSS) sites. Using frameworks like the OMOP Common Data Model and DDI-Lifecycle metadata, we are building bridges across diverse datasets, from clinical notes to community surveys, to make them comparable, reusable, and meaningful for policymakers.

We also discussed some of the key challenges from early data mapping like inconsistent data quality checks and how proper investment in digital infrastructure and training can drive AI innovation. Yet, with AI increasingly shaping health, governance and economic systems across Africa, the need for conversations around context-appropriate AI governance frameworks emerges. It is painfully obvious that Africa simply importing models built elsewhere is not working, instead homegrown approaches grounded in local realities and societal norms are what is needed. Collaborative initiatives like DSWB show the potential of AI becoming a catalyst for inclusive, sustainable development when African institutions co-design solutions grounded in shared governance and ethical use.

Our DSWB project aims to build a new generation of African data scientists who can work collaboratively across borders, sharing tools, models, and ideas while respecting the sovereignty and privacy of national datasets. It’s a space where African researchers from Kenya, Uganda, Senegal, and South Africa come together to co-create solutions that work for their communities. Through DSWB, we have begun exploring federated learning, an approach that allows models to be trained across multiple datasets without moving sensitive data. This is a quiet revolution in the making. It means we can collaborate globally while keeping data local and secure -and maintaining data sovereignty.

Beyond the Numbers: The Human Moment

What made IDW 2025 truly special was both the advancements in data technology but also the people. The hallway chats, the coffee-break laughter, and the sense of shared curiosity were as valuable as any keynote. We found ourselves among a community that understood both the rigor and the heart behind the numbers. These moments reminded us that behind all the servers, scripts, and dashboards, the data community is really a people community. We might speak in lines of code, but what connects us is a simple truth: we care about the same world.

And then there were the lighter, unforgettable moments, the kind that conferences rarely put on the agenda but everyone remembers. One evening, we hopped onto the CityCat ferry for a boat ride down the Brisbane River to an event at Eat Street Northshore. By the time we docked, the air was filled with music, laughter, and the irresistible scent of global cuisines. Somewhere between sampling Thai noodles, Ethiopian coffee, and Australian doughnuts, the serious talk of algorithms gave way to dancing!!

There was something about those small, human moments that made Brisbane feel like home. IDW2025 reminded us that being “home away from home” doesn’t always mean missing where you came from; sometimes, it means discovering that belonging can happen anywhere people share purpose, curiosity, and good humor. For a few days, we weren’t just researchers and programmers; we were a floating global village of storytellers, dancers, and friends who just happened to love data.

Looking Ahead

As we return from Brisbane, the invitation is clear: It is important to strengthen the pipelines we have established, engage in meaningful engagement with communities before publicizing our findings, and take action based on our discoveries.

Preparations are now underway for the next IDW, which will take place in South Africa, where we hope to enhance debates with an even bigger representation of African voices, develop cross-institutional collaboration, and increase our impact.

In the end, what we carried home from IDW2025 was that in a world often divided by borders, data has given us a new kind of unity. A reminder that the pursuit of knowledge, when done ethically and inclusively, is a global act of humanity.

And that, perhaps, is the greatest lesson we carried home from Brisbane!!

Connections for a Sustainable Open-Science Future

Lili Zhang, Director of the Global Open Science Cloud International Programme Office (GOSC IPO), Computer Network Information Center, Chinese Academy of Sciences

On my first day in Brisbane for SciDataCon and International Data Week (IDW) 2025, I was captivated by local exhibits, especially the keyword ‘connection’ that stood out to me. ‘Connections bring people together and are vital in preserving cultural knowledge, strengthening identity, fostering belonging, and tethering past, present, and future generations.’ The same applies to open science infrastructures.

As a key pillar of Open Science, e-infrastructures connect advanced technologies with domain-specific solutions for robust development; they connect cloud services to the widest possible audience; and they tie digital revolutions to sovereignty, ensuring a fair and equitable future. As part of extended and ongoing collaborations and discussions, the Global Open Science Cloud initiative (GOSC) connects us with an IDW session titled ‘Open science actions toward achieving the SDGs: an infrastructure dialogue with the Global South’ on September 15th in Brisbane, Australia.

The UN 2030 Agenda has become a cornerstone of global science and policy efforts, demanding urgent, collective action to tackle poverty, inequality, climate change, and other systemic problems. The Global South remains at the forefront of the risks and opportunities within the SDGs framework. These regions face the most significant development challenges, but they also possess a wealth of untapped knowledge systems, scientific talents, and emerging infrastructures.

Co-organized by the Global Open Science Cloud International Programme Office (GOSC IPO), the session featured discussion on fostering a dialogue about how trusted, cross-national, and cross-regional e-infrastructures can promote science and innovation in line with the SDGs. Dr. Tshiamo Motshegwa, Director of the African Open Science Platform (AOSP), highlighted regional efforts in developing pan-African cyberinfrastructure that support the SDGs. He sparked conversations around the need and impetus for creating a federated and policy-aware research environment in Africa, including one that addresses the impacts of the SDGs. He highlighted the opportunity to advance these conversations at International Data Week (IDW) 2027, which will be hosted in South Africa. Dr. Motshegwa also shared African perspectives on managing data sovereignty and human rights. He highlighted that navigating data sovereignty and human rights is a complex, delicate task that often involves balancing and carefully managing related tensions. He also provided African examples of policy tools, including the AU AI Strategy, AU Data Policy Framework, and AU Science, Technology and Innovation Strategy for Africa (STISA) 2034, along with projects on the continent being used to promote open science.

Additionally, Dr. Huajin Wang from CNIC, CAS, introduced the Research Data Collaboration Network (RDCN) and CoNet as cross-domain data toolkits for open collaboration. RDCN is a novel reference architecture designed to facilitate seamless, end-to-end collaborative analysis. CoNet, meanwhile, is a production-level implementation of the RDCN architecture, deployed across the Chinese Academy of Sciences (CAS) data center system to enable complex, cross-domain scientific analysis. Dr. Rania Elsayed Ibrahim from the National Authority for Remote Sensing and Space Sciences in Egypt virtually shared insights on sustainable North-South collaboration driven by capacity development frameworks. For my part, I discussed GOSC’s efforts to develop advanced e-infrastructures for the Global South, supporting SDG research and capacity building. The following panel discussion broadened the conversation, with Dr. Pankaj Kumar of the International Geographical Union (IGU) stressing the importance of AI-focused training and the implementation of the SDGs. Dr. More Manda, Director of the National Integrated Cyberinfrastructure System (NICIS) from South Africa, stressed the urgent need to develop integrated solutions to boost investments in e-infrastructures for the Global South.

Building on ISOSC 2025, this GOSC IDW discussion deeply impressed me by addressing the SDGs challenges we face and affirming the importance of co-developing interoperable technical solutions for open science infrastructures. Acting as a bridge to connect trust across various sectors, we hope that future GOSC events will continue to promote Open Science and the SDGs. Let’s connect in the next IDW series to be hosted in South Africa in 2027!

More information is available as follows:

GOSC Initiative: https://codata.org/initiatives/making-data-work/global-open-science-cloud/
ISOSC 2025: https://www.cstcloud.net/news/200.jhtml
IDW 2025 GOSC session: https://www.cstcloud.net/news/203.jhtml

Building Participatory Data Infrastructures Across Disciplines and Geographies

Reflections on the ECR-organized session ‘Perspectives on Data Repositories across Disciplines, Geographies, and Cultures’ from CODATA Connect leads Pragya Chaube, Louis Mapatagane, and Cyrus Walther.

At International Data Week 2025, the Early Career Researcher (ECR) session, “Perspectives on Data Repositories Across Disciplines, Geographies, and Cultures,” was more than just a panel; it served as a quiet manifesto. It reminded us that data repositories are not storage systems; they are knowledge goods. And knowledge goods, by their very nature, depend on participation, inclusion, and shared stewardship.

The session presented a rare cross-section of real-world frictions: genomic data implicate sovereignty, particle-physics data that overwhelm infrastructure, human rights data requiring machine readability for accountability, disparities in climate science data, and open research data cultures where practice lags behind aspiration. The ECR panel converged on a single truth: if data are to serve humanity, then the design architecture of the next generation of repositories that hold them must overcome these frictions and reflect humanity’s diversity.

Sovereignty Before Structure

The scientific and infrastructural requirements of genomics highlights to us that repositories are never merely technical artefacts. But as Claire Rye reminded us through her work on genomic data in Aotearoa New Zealand, no repository is ethically sound if it fails to respect sovereignty. Te Tiriti o Waitangi (‘Treaty of Waitangi’ – founding document of New Zealand, signifies partnership between the British Crown and Māori) affirms that taonga Māori (full chieftainship of indigenous possessions), including genetic data, remain under Indigenous authority. This is not an afterthought to FAIR; it is the foundation of what the CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) principles demand. When we speak of openness, we must first ask: open for whom, and on whose terms? Without community control and benefit-sharing encoded in governance and metadata, openness risks becoming another form of extraction.

Scale Without Participation Reproduces Inequality

In high-energy physics, the problem is not a lack of collaboration or data but the asymmetry of participation. As Cyrus Walther illustrated, experiments such as the Large Hadron Collider (LHC) or Square Kilometer Array Observatory (SKAO) generate petabytes of data — but access to those data, and to the computational ecosystems required to analyse them, is not always given. Here lies a paradox: the most open scientific projects can still be closed in practice. Membership fees, specialised software, and critical know-how determine who gets to participate in knowledge production. If repositories are to be truly open, they must democratise capability. That means designing open collaboration structures such as Cherenkov Telescope Array Observatory (CTAO) and open methods that allow researchers from the Global South and early-career scientists to contribute meaningfully as co-authors of discovery.

Machine Readability as a Justice Technology

Adriana Bora’s Project AI against Modern Slavery (AIMS) demonstrated a different frontier: how data infrastructure can define accountability itself. By converting thousands of corporate modern slavery statements into machine-readable datasets, AIMS transforms what was once bureaucratic reporting into a living, searchable archive for justice. In this context, machine readability is not pedantic; it is ethical. A PDF is a barrier; a CSV serves as an invitation – — because when information is locked in static documents, it excludes most people from engaging with it, but when data is structured and open, it empowers anyone to analyse, question, and act on it. Metadata and semantics in line with the FAIR principles provide a guide and make the data easier to use consistently and programmatically. When governments require and resource standardised, machine-readable reporting, they shift the cost of use from the vulnerable to the powerful. That is what participatory infrastructure looks like: when format embodies ethics.

Regional Data as Global Equity

Louis Mapatagane’s call to action on climate data inequalities strikes at the core of global disparity. Most climate data reside in the Global North, while communities suffering the harshest effects remain invisible within global systems. Without locally produced and managed data, adaptation policies are based on borrowed evidence. His call for universities to host regional repositories grounded in FAIR + CARE principles redefines such repositories as civic institutions that connect research, education, and local resilience. In this perspective, data justice begins with agency, not access. Regional capacity forms the foundation.

Culture Eats Policy

Pragya Chaube’s reflections on open research data in India brought the conversation full circle: even the best global frameworks falter without supportive cultures. India is the world’s third-largest research producer, yet data-sharing remains low, with fewer than one in ten researchers depositing data in repositories. The barrier is not just technical, it is institutional and cultural. Many institutions lack the policies, incentives, and infrastructure that enable researchers to share confidently and be recognised for doing so. Mandates alone do not create a culture of sharing — incentives, literacy, and recognition do. When researchers are rewarded for curation, not just publication, repositories begin to serve their true purpose: collective advancement.

Toward Participatory Data Futures

The throughline across these talks is unmistakable: data repositories must evolve from being mere collections to become commons. By commons, we refer to data repositories conceived as community-governed public goods, enabling shared stewardship and equitable access for the common benefit. Their legitimacy will depend not on the terabytes they host but on the diversity of those empowered to use them. Participatory design is not an embellishment; it is the architecture of trust.

As we look towards International Data Week 2027 in South Africa, the challenge before us is clear: to shift from access as a mere slogan to the reality shared analytical power as a global norm. That is how we turn repositories into living knowledge goods, and knowledge goods into instruments of equity.

L to R: Louis Mapatgane, Adriana Bora, Pragya Chaube, Claire Rye, Cyrus Walther

Ode to LOI (Legal and Organisational Interoperability) – Data moves at the speed of trust

Reflections on Legal and Organisational Interoperability at IDW2025 from Matti Heikkurinen, LOI Lead and Project Portfolio Manager, CODATA

I’ll start with a confession: until quite recently, whenever I heard the word “interoperability”, I thought of the ability of any two systems to reproduce the same bit sequence across a connecting medium (wire, fibre, or airwaves). Formidable challenge, when you consider all the possible combinations of network vendors, device types, software stacks, noise, hardware failures, and human factors at play. But the engineering community has developed standards and processes to test technical interoperability through protocol analysis, plugtests, fault-tolerant software and hardware, among others. The blackouts are rare, worms usually stay in the cans, and Cloud outages shouldn’t cause insomnia (we hope).

MoUSLA à la réalité

Travelling towards Brisbane, I had a high degree of trust in the technical and semantic interoperability of the aircraft components, backed by Byzantine Fault tolerance. I was less certain about finding pure algorithmic solutions or engineering approaches to provide complete solutions for the topics of the IDW 2025 session at the destination: Legal and Organisational Interoperability (LOI). This covers the ability of organisations not only to exchange information (Technical interoperability) and understand its meaning (Semantic interoperability), but also use the shared meaning to achieve mutually beneficial goals (Organisational interoperability) in a manner that defines and complies with the relevant legal basis for data sharing (Legal interoperability).

Two randomly connected organisations are unlikely to reproduce the same policy decisions, given their unique governance models, legal frameworks, and standard operating procedures just because they can reproduce bit sequences sent by the other party. Nevertheless, organisational and legal interoperability are crucial to society, and mechanisms to achieve mutually beneficial goals have been developed.

Memorandums of Understanding (MoUs) and Service Level Agreements (SLAs) have received significant attention as promising solutions for these challenges. And indeed, they do provide formal components to support and document Organisational and Legal interoperability. However, one of the motivations for the session was the (possibly controversial) view that, even if the agreements themselves were fully machine-actionable, they wouldn’t guarantee interoperability. They are the fiber-optic cables of LOI that need infrastructure around them to become operational; MoUs and SLAs are the desserts, not the whole meal.

CAREless whisper – Guilty machines have got no rhythm

The opening plenary put any worries about the fit of our session topic to rest. The talks covered different aspects of indigenous data governance (presented in more detail in another blog), covering the different aspects of the CARE principles (Collective benefit, Authority to control, Responsibility, Ethics). The interesting weak signals in the session were the immediate pushback on governance models that were seen as too simplistic and the mutual benefits of adhering to CARE principles based on the SODA principles. The latter seemed important from the sustainable CARE point of view; ethics backed by enlightened self-interest tends to be much stronger LOI component than ethics alone.

The broader context of the LOI issues were covered by several other sessions and talks: the session on “Increasing Resilience of Global Earth and Environmental Science Data Supply Chains” brought new kinds of hazard scenarios into the spotlight, the RDA plenary session “Balancing ethical considerations and Open Science principles when sharing community relevant data” presented the challenges in determining when sharing – or collecting – data might do more harm than good.

“Our” session – “Legal and organisational aspects of data interoperability: climate adaptation case studies” – included talks and discussions that were thought-provoking without exception. Some of the highlights included:

O(N2) challenge of sharing environmental data in Australia: nine jurisdictions, multiple overlapping laws, inter-state data exchange requiring compliance with federal and different state legislations (Kheeran Dharmawardena, Australian Dataspaces). Kheeran’s presentation included a table comparing different regulations in the Australian context.
Legal and organisational interoperability in the ARDC People Thematic Data Research Commons (Adrian Burton, ARDC). Adrian described work in the ARDC People Commons and the wider OHDSI-OMOP community to establish specific terminologies for sharing agreements and consent.
Urban research combining high-value (including high commercial value) and sensitive datasets (Pascal Perez, Director, AURIN). Pascal emphasised the importance of detailed agreements among participating organisations that deal with specifics in a way proportionate to risk and consonant with benefits.
Thanasis Sfetsos’s presentation of the Urban Heat Island use case of the Climate-Adapt4EOSC project, described the enablers for organisational and legal interoperability that have been identified in the initial analysis.
Supporting CARE principles using RAID and local contexts (Rebecca Farrington, Director of Research Data Systems, AuScope), including the use of specific metadata to allow the retrospective application of usage conditions.

The session could perhaps be summarised by Adrian Burton’s quote from the first day of the event: “Data travels at the speed of trust”.

LOI Interoperability – Fade to (shades of) grey?

Reading the above summary, one might wonder where the FAIR data community should focus its efforts amid the growing number of data-sharing scenarios that no longer clearly fall into the binary “allowed-forbidden” categories? Will the response to critical hazards be hampered by the need to put the benefits and risks on a scale with an arbitrary (and context-dependent) approval line? Should we just reinforce and expand ethics boards to cope with the growing challenges?

Personally, I would argue that growing awareness of the need to operate in this “grey area” makes it more important to develop semantic frameworks and tools to describe CARE issues. Humans undoubtedly need to stay in the loop, but robust, consistent methods for describing the issues are crucial for making decision-making efficient, consistent, and CAREful. The work of initiatives such as the Cross-Domain Interoperability Framework (CDIF) and Climate-Adapt4EOSC received a major boost at the IDW, with the idea of reviving the RDA-CODATA Legal Interoperability IG gaining considerable support. As a result, we hope to observe and report an increase in the speed of trust at IDW 2027!

References

F. Schiller, L. van Beethoven: Ode to Joy
N. Lawson: Instant Chocolate Mousse
G. Michael et al: Careless Whisper (Official Video)
J. Gum et at: Increasing Resilience of Global Earth and Environmental Science Data Supply Chains
L. Bezuidenhout et al: Balancing ethical considerations and Open Science principles when sharing community relevant data
Burton et al: Legal and organisational aspects of data interoperability: climate adaptation case studies
Visage: Fade to Grey
World Fair CDIF working group: Introducing CDIF

October 2025: Publications in the Data Science Journal

	Title: Information Needs and Data Harmonization—Two Sides of the Same Coin? Author: A. J. Million, Jenny Bossaller, Sanja Gidakovic URL: http://doi.org/10.5334/dsj-2025-030
	Title: Financing Models for Sustainable Data Reuse Infrastructure Author: Rob W. W. Hooft, Marco Roos URL: http://doi.org/10.5334/dsj-2025-029

CODATA Blog

News from the CODATA community and from Simon Hodson, CODATA Executive Director