This is the seventh in the series of short statements from candidates in the coming CODATA Elections at the General Assembly to be held on 17-18 October 2025. Donald Hobern is a candidate for the CODATA Executive Committee as an Ordinary Member. He was nominated by Australia.
I hold a degree in Classics and worked for IBM for 16 years as a software engineer and then data architect. I have worked since 2002 in research data management, much of this in the field of biodiversity informatics, first as a Programme Officer for the Global Biodiversity Information Facility (GBIF) with responsibility for its data standards (including work on the earliest versions of Darwin Core) and data architecture, and subsequently as Director first for the Atlas of Living Australia and then for seven years for GBIF. From these roles, I learned the intertwined nature of technological and sociological aspects of research infrastructure and the importance and benefits of maximising international perspectives and collaboration. In association with these positions, and then following my return to Australia in 2019, I have also contributed to data standards, data management, infrastructure delivery and working groups for Biodiversity Information Standards (TDWG), Encyclopedia of Life, GEO BON, Catalogue of Life and the International Barcode of Life Consortium and provided inputs to many national and international biodiversity data activities.
I am currently employed as Data Management Director for the Australian Plant Phenomics Network (APPN), a government-funded network of nine research facilities supporting phenotyping studies for crops and other plants in both controlled and field environments. I lead a team working to development end-to-end FAIR data management for this network, using RO-Crate as a consistent packaging framework for all APPN-supported studies and enhancing the Mininum Information About a Plant Phenotyping Experiment (MIAPPE) model with SSN/SOSA and other shared ontologies in order to provide both human and machine interoperability. Our focus is both on pragmatic solutions that minimise complexity for plant scientists or facility technicians and on ensuring that all datasets contribute seamlessly to a richly connected linked-data graph for both domain and transdisciplinary uses. Data patterns from each domain can readily inform solutions in other fields. I believe that this APPN work aligns well with data management requirements in many other earth science and life science domains.
Robust data engineering and mainstreaming all aspects of the FAIR data principles are critical if we wish to maximise benefits from research activities and other data collection efforts. Even within a single research domain, low interoperability and reusability make meta-analysis or time-series studies extraordinarily expensive. Linking data across domains is essential if we are to provide the evidence base for modeling complex systems and addressing interconnected sustainability challenges, but every linkage multiplies the impact from poorly described and structured data. The recommendations from the WorldFAIR project and particularly the Cross-Domain Interoperability Framework (CDIF) are invaluable pointers for how to proceed. Simplifying publication and reuse of vocabularies (conceived as a vehicle for transfer and adoption of expert knowledge) is an important and achievable component that needs more attention.
The continued and growing global focus on artificial intelligence and machine learning solutions also underscores the importance of coordinated effort to describe and document data. Better standardisation and richer metadata will increase the volume of data suitable for any AI/ML applications and reduce the probability of serious misuse. Perhaps more importantly, the same improvements, particularly with appropriate attention to data provenance and transformations, are necessary to ensure that data collected by human and machine observations and data modeled from these using deterministic algorithms can be separated from synthetic data from other models and sources. Clarity on data sources is in effect a cybersecurity issue to prevent pollution of key information resources.
I champion open sharing of data wherever feasible and collaborate to ensure that data from different instutions and different regions can readily be combined and reused to expand understanding across time and space and to enable evidence-based research and policy responses.
CODATA has a central role to play in advocacy and developing best practice for modern data management and data-driven products. I would be excited to be able to contribute what I can to help achieve its mission.




























