Author Archives: codata_blog

My new experience in Italy at the CODATA-RDA Research Data Science Summer School

This post was written by Neema Mduma. Neema recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy. Her participation was kindly supported by AFDB.

This post is a syndicated copy of the one at https://neylicious.github.io/ml/2018/10/03/italy.html

My PhD journey has been great so far, apart from the sleepless nights (totally worth it though!). Last year I attended different events in USA, it was a great exposure, and I enjoyed both the scientific and the social programmes. I was looking forward to find new opportunities and travel elsewhere to learn new experience and extend my existing network.In June, 2018 I received an invitation to attend the CODATA-RDA Research Data Science Summer School which was held at the Abdus Salam International Centre of Theoretical Physics (ICTP) in Trieste, Italy. The summer school was held from 6th to 17th August, 2018 with the aim on building competence in data analysis and security for participants from all disciplines and backgrounds from Sciences to Humanities.The level of engagement and interaction between participants and instructors in this summer school was outstanding, helpers were always there to provide technical assistance. I was exposed to useful Machine Learning techniques that I will apply in my ongoing study. The Executive Director of CODATA, Simon Hodson presented to us various opportunities such as CODATA journal and many others.

This summer school gave me an opportunity to extend my networks with other academics and experts in the field of Machine Learning. Additionally, I had a chance to experience new culture and explore new places like Rome, Venice and Ljubljana. Sadly, I was the only participant from Tanzania, so I encourage my fellow Tanzanians to apply for calls and seize opportunities in Data Science workshops and summer schools. Lastly, I would like to thank the organisers of the summer school for making it a great success, and the African Development Bank (AfDB) for the financial support.

Jianhui LI: Candidacy for CODATA Vice President

This is the eleventh in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Jianhui LI is a candidate for the CODATA Executive Committee as a Vice President. He was nominated by China, PASTD, LODGD, USA.

Dr. Jianhui Li is the department director of Computer Network information Center (CNIC), Chinese Academy of Sciences (CAS), a Professor and PhD Supervisor at the University of Chinese Academy of Sciences, an Ordinary Member of the Executive Committee of CODATA (2014-2016,2016-2018). He has worked on data infrastructure, data management and data-intensive computing since 1999, and has led the scientific data infrastructure development and open data activities in CAS for more than 10 years.

He has been always promoted the implementation of open data principles, data policies and put them into real practices as well. In 2017, He launched China’s first national-level large-scale survey on data sharing, which helped MOST formulate the first national policy on scientific data management and openness in China. To nourish a local data sharing culture among the Chinese research community, he launched a bilingual open-access data journal China Scientific Data (www.csdata.org), together with the data repository ScienceDB(http://www.sciencedb.cn). Moreover, he is a very active data scientist and leader in advancing the frontiers of data sciences. He is in charge of the development of the CAS Scientific Data Cloud for Big Data Analysis and Large Scale Data-Intensive Scientific Research. He is now leading the development of one scientific big data management system funded by the National Key R&D Plan, and serves as co-chair of the technology working group in the CASEarth Programme(http://www.casearth.com ) – a CAS Strategic Pioneer Research and Development Programme mainly focusing on building a global big data network to study Earth and support research on climate change, as well as to predict and mitigate natural disasters.

He has severed as Secretary General of CODATA-China for 10 years and organized a serial of successful international and domestic activities, including China-US Roundtable on Scientific Data, Training workshop for Developing Countries on Scientific Data, and National Scientific Data Conference. He initiated the International Training Workshop for Developing Countries on Scientific Data, sponsored jointly by CAS, CODATA and CODATA-CHINA. The training workshop has been held four times in 2012, 2014, 2016 and 2017, which attracted more than 80 participants from over 20 developing countries involved. The Annual National Scientific Data Conference is another event initiated in 2014 and now is the most important national academic conference on scientific data in China, providing a friendly platform for exploring the frontiers of data science and exchanging knowledge and experiences among thousands of scientists. He has been working as a member of CODATA EC since 2014, and making contributions for CODATA Strategy and its implementation.

With his extensive experience in national and international data activities and outstanding research on data science, scientific data management and sharing, he will continue to help CODATA carry out its mission, objectives and key initiatives as articulated in the Strategic Plan, especially make contributions on data policy, data management and data science.

Ernie Boyko: Candidacy for CODATA Executive Committee

This is the tenth in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Ernie Boyko is a candidate for the CODATA Executive Committee as an ordinary member. He was nominated by Canada, Israel, South Africa, TGFC.

Ernie Boyko is an agricultural economist with extensive experience in data development, data dissemination and research data management.

His wide experience at Canada’s national statistics agency, Statistics Canada, involved working at senior levels in a variety of areas including agriculture statistics, corporate planning, electronic dissemination, census operations and library and information services.

As an advocate for data access, his crowning achievement at Statistics Canada was the creation of the Data Liberation Initiative with Wendy Watkins from Carleton University. This program allowed affordable access to all public microdata and aggregate files for the first time to post-secondary institutions. It has recently celebrated its 25^th anniversary with 79 institutional members. DLI has been cited internationally as a model program for statistical institutions.

Ernie was involved in several assignments with the World Bank and OECD’s PARIS21 projects. This involved work in several African and Asian counties in data development for agriculture and dissemination and data management policies for statistical agencies. He was able to put his Statistics Canada and international experience to use while spending a decade as an Adjunct Data Librarian at Carleton University where he taught the basics of research data management to faculty and graduate student researchers.

Mr. Boyko is a past president of the International Association for Social Science Information Services and Technology (IASSIST) an organization he has been part of for nearly 30 years. This has given him exposure to the challenges of social science data services. It was under this umbrella that he was part of a working group that developed a metadata standard, the Data Documentation Initiative (DDI). DDI is now widely used as a standard for documenting research data. In 2018, IASSIST presented Ernie with a lifetime achievement award for his work.

Ernie has been involved with CODATA for a decade, serving as an observer, a member and currently the Chair of the Canadian National Committee for CODATA. As chair, he is leading a project that will realign the emphasis of the national committee with the new CODATA constitution’s focus on research data management. The creation of the International Science Council through the merger of ICSU and the ISSC has led CODATA to focus more squarely on research data management and outreach. The goal of the current work of the Canadian committee is to develop a process that will allow reporting on the status of RDM in Canada in a way that facilitates international comparisons.

His varied experience stands Ernie in good stead to help meet the challenges faced by CODATA in its transitions to being part of the International Science Council. It is hoped that Canada’s RDM project will be of interest to other countries, thus enabling a broader understanding of the state of research data for the benefit of science.

Open Consultancy – Development of interactive tools to support the use of data collected by Major Groups and other Stakeholders to measure progress towards the Sustainable Development Goals (SDGs)

Background

The availability and access to high-quality, timely, and reliable data, disaggregated by relevant characteristics and supplemented with necessary contextual information for its interpretation and use, is fundamental to the successful implementation and monitoring of the 2030 Agenda. National Statistical Systems, however, face many challenges not only in producing statistics to fill data gaps in national, regional and global SDG indicator frameworks, but also in integrating and making the vast amounts of existing SDG-related data and information accessible to decision makers in a meaningful way.

Different actors are often able to contribute timely and disaggregated data on specific issues, geographies or groups, which can supplement and provide additional context to understand the data on official statistical indicators at the national and global levels (such as data on informal settlements or needs in specific communities). Therefore, several important initiatives are being undertaken by National Statistical Offices in coordination with other members of national statistical systems, and in partnership with international and regional organizations and stakeholders from civil society, academia, and the private sector, to integrate new data sources into the production and dissemination of official statistics for sustainable development.

The UN DESA Division for Sustainable Development Goals’ (DSDG) ongoing EU grant entitled “SD2015: delivering on the promise of the SDGs” seeks to strengthen and support Major Groups and other Stakeholders’ (MGoS) engagement, including by strengthening their capacity to monitor and contribute to the 2030 Agenda.

Major Groups and other Stakeholders play an important role in reporting on the implementation of the Sustainable Development Goals. MGoS are often able to contribute timely and disaggregated data on specific issues, geographies or groups, which can supplement and provide additional context to understand the data on official statistical indicators at the national and global levels.

Citizen-generated data, as well as data produced by such constituencies as the private sector and local authorities, faces complex challenges in being integrated and used to support monitoring and decision- making to accelerate implementation of the 2030 Agenda. In particular, it is often difficult to find and link these supplementary data sources, due to discrepancies in the metadata structures and vocabularies used to describe and organize their content, as well as the lack of adherence to common technical and statistical standards. These challenges counter disaggregation, and lead to lower visibility and use of existing data sources that could be leveraged by governments to help fill in monitoring gaps for the SDGs.

Currently, there is no common framework to structure, aggregate, and give visibility to innovative data sources which can be leveraged by policy makers, analysts, and the general public to help implement and monitor progress towards the SDGs.

Objectives

Building on the UN Statistics Division’s existing work in providing guidance towards the use of common data standards to monitor the SDGs, the project aims to provide a common framework and guidelines to improve the visibility, interoperability and usability of supplementary sources of data on sustainable development to complement the work of national statistical offices, consequently raising overall awareness on progress towards the SDGs. Use of semantic web tools will be employed to address the needs for a lightweight data-interchange format across the web

Work Assignment

The Consultant will perform the following duties:

Review existing prototypes of SDG data ontology and other related vocabularies (e.g., UN- BIS Thesaurus)
Draft a document with guidelines for the implementation of linked-data on statistical data provided through a web API, and for digital text exposed through web documents
Draft a document with general guidelines for the consumption of the SDG linked
Develop an application to pilot the collection and integration of multiple alternative data sources in one

Duration of contract

The proposed duration of the contract is for 3 months starting in November 2018.

Duty Station or Location of Assignment

The consultant is not required to work in a UN office, but must be available for regular phone and web conference meetings with DSDG, UNSD and project partners during office hours. Travel or commute time to and from the United Nations Headquarters, as well as related expenses, are not part of the consultancy.

Travel

The Consultant is not required to travel for the performance of the assignment.

Expected outputs*

The consultant will develop the following deliverables:

The following set of on-line guidelines (to be delivered in the form of GitHub wiki pages, Jupyter notebook(s) or similar on-line documents):
- Guidelines for the practical implementation of semantic web standards (JSON_LD, microdata or any other linked data artifact) to achieve the greatest exposure of SDG statistical data, using unique URIs defined in various publicly available ontologies and
- General guidelines for the consumption of the SDG linked (to be delivered in a word document)
An application to pilot the collection and integration of multiple alternative data sources with official SDG statistics in one This application will use the linked data infrastructure created to connect nontraditional data sources with official statistical data.

Delivery dates of output

First progress report submitted by 15 December 2018
Second progress report submitted by 15 January 2018
All outputs submitted by 15 February 2019.

Performance indicators

Timely delivery of all components specified;
Quality of the documentation and tutorials developed;
Effectiveness of solution provided on an architectural level;
Timely preparation of all activities and regular progress updates;

Qualifications

Fluency in English
An advanced degree in the field of data or information science or related fields
A minimum of five years of working experience on information science including experience in the use of linked data and semantic web
Demonstrated experience in the implementation of linked data projects using common linked data standards, such as RDFa, JSON_LD, Mircodata or other linked data technologies
Familiarity with the N. Sustainable Development Goals and monitoring framework
Knowledge of Agile methodology is desirable

Interested Applicants

Please send your CV, Cover Letter and financial proposal (daily fee) by 19 October 2018 to:

Ms. Nan Jiang, Division for Sustainable Development Goals, UN Department of Economic and Social Affairs jiang2n@un.org; +19173674426

Humans of Data 26

“What motivates me to keep going is teaching people who are going to keep this going after us. Managing data will find its place in the world. I don’t mean analytics, I mean taking care of the data so that people can run legitimate analytics.

It annoys me just now – a lot of these data science and data analytics programs, they’re all about statistics, visualisation, analysis, but very little about actually curating the data underneath. Not to say that data curators don’t need to know a little bit about analysis but people who do data science in the business environment, they often don’t know much about curation. People working for businesses, they complain that they spend 80% of their time cleaning data and without that, the data wasn’t usable. But I feel like saying, ‘If you hired data curators you wouldn’t have to deal with that problem!’”

Daisy Selematsela: Candidacy for CODATA Executive Committee

This is the ninth in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Daisy Selematsela is a candidate for the CODATA Executive Committee as an ordinary member. She was nominated by PASTD TG.

Daisy Selematsela holds a PhD and is Professor of Practice of Information and Knowledge Management of the University of Johannesburg. She has a combined 27 years’ experience in the Higher Education sector and within the National System of Innovation (NSI). She serves as mentor for emerging researchers in interdisciplinary areas and an external examiner for undergraduate and postgraduate students in Library, Information Science and Knowledge Management.

Daisy serves on a number scientific bodies and also as an editorial board member of the South African Journal of Library and Information Science (SAJLIS) and the Global Change Research Data Publishing and Repository and a reviewer of several programs.

She serves on a number of national boards and Advisory Councils. Internationally she is a member of Board of Directors of ORCID (represent EMEA – East Asia, Middle East and Africa) and the Confederation of Open Access Repositories (COAR).

Daisy has served the then ICSU and CODATA on a number of forums, contributed to position papers, co-ordinated workshops, chaired conference sessions and made numerous local and international presentations on areas related to CODATA objectives. She has served CODATA in the following areas:

Data Science Journal Review – corresponding Editor 2009
Served as ex-officio member of the South African National Committee for CODATA for 11 years.
World Data Centre on Biodiversity and Human Health prototype proposal and hosting;
Executive member: International Council for Science Union (ICSU SCID) ad Hoc Committee on Information and Data in 2007.
Chair: International Council for Science: Committee on Data for Science & Technology (ICSU: CODATA) Task Group on Data Sources for Sustainable Development in SADC 2007 -2011.
Executive member: (ICSU EDC Panel) International Science Union World Data Centre Panel2008.
Member: CODATA Task Group on Preservation of and Access to Scientific and Technical Data in/for/with Developing Countries. Co-chairs: CODATA – WDS joint subgroup 2011 to date.

She was part of the Founding and Executive Members of the International Data Forum (IDF) 2007-2010. Instrumental in the formulation of Statement on Open Access for grant funding; Statement on ORCID ID and Predatory Publishing.

She holds a PhD in Information Science from the University of Johannesburg; a Fellow of the Higher Education Resource Service for Women in Higher Education (HERS) South Africa and Bryn Mawr College in Philadelphia, USA. Acknowledged with the Knowledge Management Award in 2016 by the World Education Congress.

Muliaro Wafula: Candidacy for CODATA Executive Committee

This is the eighth in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Muliaro Wafula is a candidate for the CODATA Executive Committee as an ordinary member. He was nominated by Kenya and Agriculture TG.

I am a Kenyan nationality who has served Jomo Kenyatta University of Agriculture and Technology (JKUAT) for 25 years. I am currently a member and the Chair of CODATA Kenya. I hold the following additional responsibilities:

Chair, Technical Advisory Board of the Africa Open Science Platform Project funded by the South African government’s department, Science and Technology (DST), the International Science Council (ISC) and CODATA. (http://africanopenscience.org.za/?p=230)
Member editorial board of the Data Science Journal (https://datascience.codata.org/);
Member editorial board of the African Journal of Food, Agriculture, Nutrition and Development;
Chair, Innovative Open Data and Visualization Sub-taskforce of the AFRICA-ai-JAPAN Project sponsored by JICA;
Chair, Training committee of the National Industrial Training Authority-Kenya (https://www.nita.go.ke/) ;
Member, committee of the United Nations SDGs Agricultural and Climate Change Pillars of Kenya.
Member, CODATA Data Policy Committee

I am currently the founder Director for the ICT Centre of Excellence and Open Data at JKUAT. I coordinate all ICT related Memorandum of Understanding between JKUAT and partners. I have served in the past as the ICT Director for 5 years and director of the Institute of Computer Science and Information Technology for 4 years.

I hold B.Sc. Science (Hons) (Kenyatta University), M.Sc Physics (University of Nairobi), M.Phil. Microelectronic Engineering and Semiconductor Physics (University of Cambridge –UK), Summer Doctoral Programme (Berkman Centre for Internet & Society/Oxford Internet Institute’s -Harvard University Law School), and PhD Information Technology (JKUAT).

I am a recipient of two IBM awards namely: the 2016 IBM Shared University Research Award on Open Data Cloud Project for JKUAT that has enabled JKUAT to be a frontier on building an open data platform for researchers in Africa, and the 2014 IBM MEA Award, for capacity building in Mobile Application development that enable JKUAT train and professionally certify a large number of application developers. I lead training on Big Data Analytics, Cyber Security, and Blockchain Technology.

I am professionally certified in various fields including Cyber Security, Mobile Application, ISO/IEC 27001:2005 Information Security Management System, Leadership and Management Capacity Development, Sage Accpac ERP Financial and Operations Management Systems, and ISO 9001:2000 on Quality Management Systems.

I am a fellow of the Computer Society of Kenya and the Cambridge Commonwealth Society. I have published a book see link at https://www.amazon.com/ICT-Policy-Strategies-Government-Sustainable/dp/3639515137 and several research papers in peer reviewed international journals. I have attended and participated in several data science, big data and open data trainings, workshops and conferences.

I am an Associate Professor of the Department of Computing at JKUAT and Director of the ICT Centre of Excellence and Open Data (iCEOD). As the director of iCEOD, accomplished the following key activities in line with CODATA objectives:

Development and implementation of the JKUAT Open Research Data (JORD) Policy. This policy is now regarded as a frontier in Kenya and a reference for other research institutions to spur data revolution in Kenya and the region;
Won the Kenya Open Data Champion Award 2017, thus promoting open science and data practices.
Orghanized and hosted an international Data, Information and Scientific Visualization Conference in August 2018, called VizAfrica 2018 http://vizsymposium.jkuat.ac.ke/
Designed and implementing a cloud-based value chain open data platform developed based on FAIR open data principles and standards in order to promote research data storage, preservation, sharing and reuse. The platform aims at:

Promoting conformity to the open data principles, policies and standards.
Linking and be linked to other open data platforms
Offering data analytic and visualization tools
Supporting and enable ICT Policies and strategies research for open development at postgraduate level.
Enabling use of research data to accelerate achievement of the UN Sustainable Development Goals (SDGs) in Kenya and the region
Establishing a call centre
Creating an ecosystem of strictly research data.

The establishment of iCEOD played a leading role in getting JKUAT declared by the Kenyan Ministry of Higher Education Science and Technology, as the ICT Centre of Excellence for the Northern Corridor Integration Project (NCIP) see link http://www.nciprojects.org/ that involves Kenya, Uganda. Rwanda and Southern Sudan.

If elected as a member of committee of CODATA, I will continue to promote CODATA activities and goals. I will contribute to the strategy of increasing CODATA national membership through the Pan African University Institute of Basic Sciences, Technology and Innovation see link at http://www.jkuat.ac.ke/pauisti/ community that is hosted at my university JKUAT. Now that I am engaged in open data and data science research, and also having the experience of leading JKUAT to develop and implement both the open research data policy (JORD) and an open data platform, see link at https://opendata.jkuat.ac.ke/ , I am ready to share lessons learnt and possible best practices to be adopted through offering technical advice to CODATA community that need it. I am currently supervising postgraduate PhD students researching on open data and data science solutions towards achievement of SDGs in developing countries.

I am developing the Africa Open Science Framework that will guide African countries and institutions to develop open science polices. The Framework will be ready by November 2018 for launch during the International Data Week in Botswana.

Barend Mons: Candidate for the CODATA Executive Committee and CODATA President

This is the seventh in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Barend Mons is a candidate for the CODATA Executive Committee as CODATA President. He was nominated by USA.

CODATA: 2018-2022

Barend Mons

Vision: CODATA serving the global community as a global champion of machine-actionable data publishing, according to FAIR principles in a well coordinated ecosystem of global organisations

A phase transition in Science
Both science and innovation are in the process of a methodological transformation. Because of the unprecedented amount of data we deal with, we are in the midst of a significant landslide away from a closed, individual-privilege-patent- and ‘center of excellence’ based system towards a system that has to support fully distributed, collective human intelligence much more effectively. But even more critically, a modern data science and innovation ecosystem should be able to maximise the use of powerful, distributed digital assistants.

The roughly ten million times increase of storage and compute power over the past three decades, accompanied by roughly a hundred thousand times decrease in storage costs, has finally brought us to a point where ‘ICT’ is frequently mis-conceptualised as a commodity. Consequently, we capture so much data and subsequently reveal such complex patterns in it, that the human mind is unable to make sense of these patterns anymore. That is…. without massive international collaborations and digital assistance. So, on top of the Internet for People, we now need and Internet for Machines, in which machine actionable data and services will play a central role.

A need for data stewardship
Unfortunately, our ability to deal responsibly with data as the principal first-phase output of the scientific process, has not kept pace with the generation and storage capabilities. The current reality is a glaring lack of expertise; a crippled practice of cottage-industry data stewardship; an almost complete lack of interoperability of data in domain silos; and a hopelessly outdated scholarly publication and reward system, which effectively prohibits open science and innovation.

Many reports have recently highlighted the unacceptable loss of valuable data, and the waste of time and effort as an estimated 70% of researcher time is spent on ‘data wrangling’. Furthermore, the persistence of narrative publishing in formats solely meant for human consumption is a nightmare for machine processing of results and data. It amounts to a means of hiding the data behind pay walls, embedded and difficult to extract from figures and tables, with remote and volatile links to ‘supplementary data’, without proper metadata and provenance. This picture is even more gloomy as it is precisely the lack of access to an reusability of data that results in the emerging and well-documented reproducibility crisis in science.

Much of this lament is painfully familiar and has been made repeatedly with too little real impact for over two decades…

The role of mandated organisations such as CODATA
Data-focused organisations with a global mandate can play a major converging role in the decades to come. With its mandate from ISC (representing nearly 200 national members and the international scientific unions and associations), CODATA should be in a unique position to assist and guide where appropriate the transition to modern data stewardship for open science. It is high time for CODATA to become a global champion of machine-actionable data publishing, according to FAIR principles, supplemented with narrative for humans, and so to help ensure an optimal data substrate for modern, data-intensive and (thus) machine-assisted science and innovation. The emergence of open science, and the recent merger of the two parent councils, prompt a timely occasion to recalibrate what the role of CODATA during and after the landslide may be. As a servant of the science and innovation communities world-wide, CODATA has to, first of all, redefine its goals in the new data reality. This should be done in line with the several high level reports from the European Commission (such as the various reports and the -SWD-roadmap for the European Open Science Cloud) and from the United States (such as the consensus study Open Science by Design), while also taking into full account the simultaneous efforts in other continents including BRICS activities and efforts for open science in Africa and Latin America. Next, CODATA needs to redefine its unique added value niche vis a vis other data related initiatives, such as the Research Data Alliance (RDA) and the Global Open FAIR (GO FAIR) initiative. These are relatively young organisations compared to CODATA but they enjoy rapid uptake in the community and in the turmoil associated with any landslide, there is confusion about the various roles they and CODATA play.

Multiple roles
The CODATA strategic plan 2013-2018 showed deep insight in the data revolution that was upon us even back then. However, the current rate of data production, and analysis, challenges has far surpassed even the boldest predictions at the time. Currently, in many scientific disciplines, the learning algorithm, frequently hyped as ‘artificial intelligence’ is now predominantly present in methodology. Contemporary science, even in disciplines where the other hype term ‘big data’ is not yet mainstream, increasingly relies on complex pattern recognition by powerful and self-learning algorithms, followed by human decision on ‘actionable knowledge’ emerging from ‘meaningful’ patterns. What we have seen in the past three years is a rapid development of machine-oriented initiatives such as the formulation of the FAIR principles (https://www.nature.com/articles/sdata201618), describing how data should be formulated, published and stewarded in a way that supports optimal reuse in open science and innovation for both machines and humans.

The Research Data Alliance (RDA) has also seen a remarkable growth pattern. Given the importance of these ‘data-driven science’ global movements, it is not surprising that RDA, GO FAIR and other organisations have arisen that address the opportunities and challenges of reusable data and services, each addressing different aspects and filling different, complementary niches in this tumultuous field. In fact, the time is now right to ensure that we create and support an efficient, mutually enforcing ecosystem of these organisations. That means staking out the appropriate ground for each, clarifying appropriate working space, synergies and eliminating unnecessary duplication. This vision includes clear definition of missions, comprising both bottom-up and more top-down approaches where appropriate, and focusing our efforts.

A vision of mutually enforcing collaboration
The following section represents my initial thinking in this area, sharpened by many discussions with leading colleagues in this field:

The oldest international coordinating organisation in the data space is CODATA, which has been in existence for more than 50 years. CODATA, is a committee of ISC, after the merger of the International Council of Science and the International Social Science Council. ISC has a second data related initiative, the World Data System (WDS), which is ten years old as an International Programme Office (but has roots in the International Geophysical Year of 1958). In addition to these ISC-affiliated organisations, the Research Data Alliance (RDA) is a five-year-old grass roots organisation mainly supported by the EC, the US NSF and NIST, and the Australian Department of Innovation. RDA has rapidly developed into a large (> 7000 individual members) organisation that serves a crucial public role, namely bottom up consensus building about approaches, protocols and standards in expert communities. The EC and the EU member states have been particularly active in the data space, also conceiving and supporting the European Open Science Cloud (EOSC) Initiative. The supporting GO FAIR initiative (Global Open FAIR), initiated as a kick start approach for the EOSC by the governments of The Netherlands, Germany and France, is rapidly growing into a practical network or existing networks of excellence in early implementation of community adopted approaches, protocols, standards and training. These four key organisations (CODATA, WDS, RDA and GO FAIR) are all international and cross-disciplinary in scope, mandated and poised to support the global science enterprise including pan-European, and global, domain specific research infrastructures and e-infrastructures. To better support global science, I propose investigating ways to better coordinate and differentiate the work space for these four and perhaps other more disciplinary and regional science data organisations. In the spirit of community-wide consensus I have discussed these issues for several months now with the leadership of CODATA, RDA and GO FAIR and the following section represents my resulting view.

A triangular shape
The key roles of CODATA, RDA and GO FAIR are distinct, complementary and synergistic. They are depicted in the triangle model below. Like any model, it will always fall short to describe reality in all aspects and dimensions, but it is a way to visualise the various complementary roles. It should be stated as a preamble that in many concrete actions, the roles of the three organisations will overlap, such as for instance in training and education, and advocacy for best practices. Therefore, the triangle model is also meant visualise how the tasks following from the focus described at the corners of the triangle dovetail when ever appropriate. With the recent establishment of the ISC, complementarity between the three organisations becomes even more pertinent. It should also be emphasised that each of the organisations has additional activities outside the scope of this collaborative structure.

RDA has a principal bottom up working mechanism centred around interest and working groups that address, and where possible solve, intellectual challenges associated with solutions needed around research data in the broadest sense (also data analytics services, software and basic compute issues are in scope). This is done in community driven manner and leads to recommended solutions and designs. Obviously, these have to be tested for feasibility in practice, which can be done anywhere, but GO FAIR has a strong mandate and basis in a growing number of so called GO FAIR Implementation Networks to rapidly test recommended solutions. These are expert communities with ‘critical mass’ (community leadership) and impact that can implement proposed solutions (by RDA or others) in practice. This also provides an early testing ground of such applications (in social change, training or actual module building for the Internet for FAIR data and services). Obviously, many key stakeholders in the community play a role in RDA working groups and in the organisation as well as in GO FAIR implementation networks.

CODATA is mandated by ISC as the international body for research data in the broadest sense, focusing on data policies, data science and data skills and education. Despite being a lean organisation, CODATA is involved in various implementation activities on interoperability, capacity building, training and dissemination,

but has also played a crucial role in the development of key data policies and principles (including those of the OECD, GEO and the ISC endorsed ‘Open Data in a Big Data World’ that have effectively become ‘soft law’ for the scientific community.

In continuous and structural collaboration, the three organisations, having already established very good practical working relationships and participating in each other’s activities whenever appropriate, can collectively serve the community by providing organised and emerging consensus building and design, coordinated early expert implementation and broad adoption of best practices. This is all pre-competitive, but can form the basis for certification of providers in the EOSC, US, BRICS and beyond.

What is my motivation to serve as CODATA president for the 2018-2022 term?
Being involved in the early days of RDA and (GO) FAIR, I have seen many critical decision points where the development of an effective ecosystem for open, FAIR science and innovation could have gone astray.

Risks for science in the data intensive age include (re)centralisation, recidivist monopoly formation, exclusion of the private sector (critical for innovation and scaling), defending powerhouses built in the transition phase, and further propagation of ‘yet more standards’. In addition, there are many misperceptions around frequently used hype terms such as ‘open’ (versus FAIR), AI, Big Data, Data Sharing, Open Access (articles) versus Open Science, Linked (Open) Data, Semantic Modelling etcetera. It is therefore imperative to support global, community compliant consensus building on commonly accepted definitions of these central concepts. CODATA, in close collaboration with RDA, GO FAIR and others, could prevent many of these potential mistakes and play a key intellectual leadership role in the transition phase described here.

As CODATA President I would like to work with the core staff in this multi-organisational ecosystem and ensure that CODATA will have a solid, specific, recognised, and effective role. I was the organiser of the foundational meeting in 2014 where the FAIR principles were conceived, the Chair of the HLEG of the EC on the EOSC and I currently co-lead the GO FAIR International support and coordination offices, with branches in the Netherlands, Germany and France. I have extensive connections to RDA, and serve on the US National Academy of Sciences Board for Research Data and Information (which is the US National Committee for CODATA). If I were also to help lead CODATA, I would concentrate on an ambassadorial role for the joint ecosystem of the various organisations as summarised in the triangle model above, and thus be in a good position and interested in using these relationships to bring RDA, GO FAIR and CODATA closer together, as well as determining the role of WDS in the new reality, each with their specific and complementary expertise networks, thus creating greater strength to the common good.

For all this to happen, it will be of critical importance that each of the supporting organisations is mandated and properly funded (although at the leanest possible level) to serve the science and innovation communities, without ever competing for the same funds. They should focus on those supra-level tasks that never make it to the top of the priority list of individual researchers and innovators.

If you agree that the time has come to better coordinate and possibly consolidate the international organisations in this important area, and appropriate mandates and resources to achieve this goal will be put into place, I would be happy to serve you as CODATA President.

Toshihiro Ashino: Candidacy for CODATA Executive Committee

This is the sixth in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Toshihiro Ashino is a candidate for the CODATA Executive Committee as an Ordinary Member. He was nominated by Japan.

Professor Toshihiro Ashino is mainly continuing research into data and knowledge representation for materials science and engineering. The article of materials ontology in CODATA Data Science Journal 2010 is regarded as an advanced research in current materials informatics area. He is participating a Japanese national project “Materials Integration” and playing an important role to develop materials data and knowledge representation for integrate heterogeneous information resources of the project.

He is also working for standardization of materials data representation, participating a series of CEN workshops, workshop on ‘Economics and Logistics of Standards compliant Schemas and ontologies for Interoperability – Engineering Materials Data’ (WS/ELSSI-EMD, 2009-2010), workshop on ‘Standards for Electronic Reporting in the Engineering Sector’ (WS/SERES, 2012-2014), workshop, ‘FAtigue TEsting DAta’ (WS/FATEDA, 2016-2017) and workshop on ‘MEchanical TEsting DAta’ (WS/METEDA, 2017-2018).

International collaboration for data standards and ontologies are one of the CODATA’s important role and increasing importance for promote open data and open science. He is continuously trying to establish them for materials science and engineering, it will extend CODATA’s activity in this field.

Also, Professor Ashino is working not only in materials science and engineering field, participating JOSS (Japan Open Science Summit) organization committee and RDUF (Research Data Utilization Forum) program committee, working to promote open data and open science activities in Japan. From June 2018, he have been appointed to the chair of CODATA Sub-Committee, International Scientific Data Committee of Science Council of Japan and working to contribute the achievement of CODATA’s objectives.

CODATA Blog

News from the CODATA community and from Simon Hodson, CODATA Executive Director