Reflections on the ECR-organized session ‘Perspectives on Data Repositories across Disciplines, Geographies, and Cultures’ from CODATA Connect leads Pragya Chaube, Louis Mapatagane, and Cyrus Walther.
At International Data Week 2025, the Early Career Researcher (ECR) session, “Perspectives on Data Repositories Across Disciplines, Geographies, and Cultures,” was more than just a panel; it served as a quiet manifesto. It reminded us that data repositories are not storage systems; they are knowledge goods. And knowledge goods, by their very nature, depend on participation, inclusion, and shared stewardship.
The session presented a rare cross-section of real-world frictions: genomic data implicate sovereignty, particle-physics data that overwhelm infrastructure, human rights data requiring machine readability for accountability, disparities in climate science data, and open research data cultures where practice lags behind aspiration. The ECR panel converged on a single truth: if data are to serve humanity, then the design architecture of the next generation of repositories that hold them must overcome these frictions and reflect humanity’s diversity.
Sovereignty Before Structure
The scientific and infrastructural requirements of genomics highlights to us that repositories are never merely technical artefacts. But as Claire Rye reminded us through her work on genomic data in Aotearoa New Zealand, no repository is ethically sound if it fails to respect sovereignty. Te Tiriti o Waitangi (‘Treaty of Waitangi’ – founding document of New Zealand, signifies partnership between the British Crown and Māori) affirms that taonga Māori (full chieftainship of indigenous possessions), including genetic data, remain under Indigenous authority. This is not an afterthought to FAIR; it is the foundation of what the CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) principles demand. When we speak of openness, we must first ask: open for whom, and on whose terms? Without community control and benefit-sharing encoded in governance and metadata, openness risks becoming another form of extraction.
Scale Without Participation Reproduces Inequality
In high-energy physics, the problem is not a lack of collaboration or data but the asymmetry of participation. As Cyrus Walther illustrated, experiments such as the Large Hadron Collider (LHC) or Square Kilometer Array Observatory (SKAO) generate petabytes of data — but access to those data, and to the computational ecosystems required to analyse them, is not always given. Here lies a paradox: the most open scientific projects can still be closed in practice. Membership fees, specialised software, and critical know-how determine who gets to participate in knowledge production. If repositories are to be truly open, they must democratise capability. That means designing open collaboration structures such as Cherenkov Telescope Array Observatory (CTAO) and open methods that allow researchers from the Global South and early-career scientists to contribute meaningfully as co-authors of discovery.
Machine Readability as a Justice Technology
Adriana Bora’s Project AI against Modern Slavery (AIMS) demonstrated a different frontier: how data infrastructure can define accountability itself. By converting thousands of corporate modern slavery statements into machine-readable datasets, AIMS transforms what was once bureaucratic reporting into a living, searchable archive for justice. In this context, machine readability is not pedantic; it is ethical. A PDF is a barrier; a CSV serves as an invitation – — because when information is locked in static documents, it excludes most people from engaging with it, but when data is structured and open, it empowers anyone to analyse, question, and act on it. Metadata and semantics in line with the FAIR principles provide a guide and make the data easier to use consistently and programmatically. When governments require and resource standardised, machine-readable reporting, they shift the cost of use from the vulnerable to the powerful. That is what participatory infrastructure looks like: when format embodies ethics.
Regional Data as Global Equity
Louis Mapatagane’s call to action on climate data inequalities strikes at the core of global disparity. Most climate data reside in the Global North, while communities suffering the harshest effects remain invisible within global systems. Without locally produced and managed data, adaptation policies are based on borrowed evidence. His call for universities to host regional repositories grounded in FAIR + CARE principles redefines such repositories as civic institutions that connect research, education, and local resilience. In this perspective, data justice begins with agency, not access. Regional capacity forms the foundation.
Culture Eats Policy
Pragya Chaube’s reflections on open research data in India brought the conversation full circle: even the best global frameworks falter without supportive cultures. India is the world’s third-largest research producer, yet data-sharing remains low, with fewer than one in ten researchers depositing data in repositories. The barrier is not just technical, it is institutional and cultural. Many institutions lack the policies, incentives, and infrastructure that enable researchers to share confidently and be recognised for doing so. Mandates alone do not create a culture of sharing — incentives, literacy, and recognition do. When researchers are rewarded for curation, not just publication, repositories begin to serve their true purpose: collective advancement.
Toward Participatory Data Futures
The throughline across these talks is unmistakable: data repositories must evolve from being mere collections to become commons. By commons, we refer to data repositories conceived as community-governed public goods, enabling shared stewardship and equitable access for the common benefit. Their legitimacy will depend not on the terabytes they host but on the diversity of those empowered to use them. Participatory design is not an embellishment; it is the architecture of trust.
As we look towards International Data Week 2027 in South Africa, the challenge before us is clear: to shift from access as a mere slogan to the reality shared analytical power as a global norm. That is how we turn repositories into living knowledge goods, and knowledge goods into instruments of equity.
L to R: Louis Mapatgane, Adriana Bora, Pragya Chaube, Claire Rye, Cyrus Walther






















