Humans of Data 28

“I‘m an ethnographer.  Well, I’m not only an ethnographer, but the heftiest part of my dissertation work was ethnography.  And as one who also has rigorous training in engineering and other natural science fields – I was surprised at how much of my research design required transformational change.

While I did the research – as I collected data – I changed what my hypotheses were, what I’d use the data for and what outcomes I’d obtain from analysing the data.  It was simultaneously confusing and exciting, but eventually I was much prouder of the outcomes than I would have been had I stuck to the plan.

It was a lesson in valuing methodological adaptation and change.  As researchers, we don’t know everything, and more people in science should have, and value, that experience.”

A milestone in the history of science based on work of the CODATA, the Committee on Data of the International Science Council


The General Conference of Weights and Measures will meet in Paris to vote on whether to re-define the International System of Units (SI) for the kilogram, ampere, kelvin and mole based on fundamental laws rather than measurement.

A unique event in the history of science is scheduled for Friday 16th November when a meeting in Versailles, France, will vote whether to re-define the International System of Units (SI) based on exact values of the fundamental constants. This would mean, for example, that the International Prototype of the Kilogram – a lump of metal which has been used to determine measurement of the kilogram since 1889 – will be replaced by a precise value deduced from fundamental laws of science.


The values are the work of the CODATA Task Group on Fundamental Physical Constants, which, every few years since 1969, has summarised and evaluated the cumulative work of scientists and technologists in publishing a recommended set of self-consistent values of the fundamental constants of nature. Their most recent work[1] has been to determine the exact values of the Planck constant h, the elementary charge e, the Boltzmann constant k, and the Avogadro constants NA, so that the value of four of the SI base units — the kilogram, ampere, kelvin, and mole (measures of mass, electric current, temperature, and amount of substance, respectively) — are no longer fixed by measurement, but are deduced from fundamental laws. They will join the other three base units — the second, metre, and candela (a measure of a light’s perceived brightness) — that are already defined in this way. The change will make the units more stable and allow investigators to develop ever more precise and flexible techniques for converting the constants into measurement units.

The decision will be made by the General Conference of the International Bureau of Weights and Measures,  an inter-governmental organization, established in 1875, as the supreme authority through which member states (60 member states + 42 associate states) act together to agree the values of the fundamental constants. Does this matter? Yes, it matters profoundly. Since the earliest human civilisations, precise measurement has been a staple of honest trade and exchange, and as science and technologies have advanced, has been vital to science’s understanding of the universe, to the precision and utility of technological devices, to the terms and trustworthiness of trade and commerce, and to the everyday lives of citizens.

Geoffrey Boulton, retiring President of CODATA and member of the ISC Governing Board, commented that

“it is important that we scientists recognize the magnitude and potential significance of this achievement, as one of CODATA’s proudest moments in its 50-year history, and to applaud not only the members of its current Task Group, under its co-chairs David Newell and Barry Wood, but all their predecessors since 1969. In recognition of their achievements, the General Assembly of CODATA, held last week, agreed, by unanimous acclamation, to award the 2018 biennial CODATA Prize to the Task Group.”

The prize rewards outstanding achievement in advancing data for science. It is the first time that the prize has been awarded to a group rather than an individual.

Images are CC BY-ND 4.0 BIPM

[1] “The CODATA 2017 values of h, e, k, and NA for the revision of the SI,” Newell et al., Metrologia 55 L13-16 (2017)

Humans of Data 27

“I’m trying to create awareness for researchers on opening up their own data.  Before working [on open data] I didn’t know anything about it.  If people open up their data there is so much more that can be done.  You can share ideas, review someone else’s data and find something that they didn’t find before.  I feel that’s very important.

I feel when people open up their data, it’s no longer people competing with each other – they’re helping each other to make the world a better place.  If you keep something to yourself you might not see everything in it that might improve the world.  But if you share it with peers, someone else might discover something that might make the world a better place.

We have the data but [currently] we’re not the owners of the data.  Africa needs to step up.  We need to be partners in research.  We need to open up our data and also to own our data.”


Data visualization uses uncomplicated language to simplify things into concepts that can be easily grasped in graphical format enabling easy understanding. It also can provide insight and ability to understand cross-cutting complex issues and identify patterns to inform formation of successful decision making in terms of strategies and solutions. Prior to the use of the current formal written language, pictures (a form of visualization) were the key medium for sharing history, plans and ideas.  Recognising the immense importance of data visualization, CODATA Kenya and the ICT centre of Open Data (iCEOD) under the leadership of its Director Prof. Muliaro Wafula in Jomo Kenyatta University of Agriculture and Technology, who is also the Chair of CODATA Kenya and an elected member of the CODATA International Executive Committee, organised the first Data, Information and Scientific Visualization Symposium in Afriuca that was held from 20-21 August 2018 in Nairobi, Kenya ( .

The ICT centre of Open Data (iCEOD) has as its sole purpose is promoting data publication and data reuse through development and implementation of Open Data management. Additionally, it links other global Open Data Centres and interested parties. iCEOD through the  iODaV (innovative Open Data and Visualization) sub-taskforce of the Africa-ai- Japan project whose main objective is to facilitate the African Innovation process through data management, analytics and visualization organised and hosted the VizAfrica symposium. The Communications Authority of Kenya (CAK), IBM, United Nations University Computing and Society, Kyoto University, Academy of Science of South Africa (ASSAf), and Lagos State University were key sponsors of this international symposium.

The first VizAfrica symposium was a successful event with an impressive turnout as close to 200 participants from across the globe drawn from government ministries, universities, research organizations, corporate, small and medium scale industries (SMEs), policy makers in various sectors of the economy and from international organizations attended the event from 20th to 21st August, 2018. The symposium also included a pre-conference training from 13th to 17th August, 2018.  The theme of the symposium was “Advancing Multi-disciplinary Data, Information and Scientific Visualization for Strategic and Sustainable Development.”

Data visualization is conducted in various disciplines and to this end; the symposium adopted a multi-disciplinary approach involving the following six tracks:

  • Manufacturing and Industry
  • Policy Regulation and Strategic Management
  • Logistics and Supply Chain Management
  • Universal Healthcare
  • Computer Graphics, Media and Animation
  • Agriculture, Food Security and Nutrition

All the six sessions comprised of: research, industry, keynote speeches and panel discussions sessions.

Prof Muliaro Wafula made the following remark during the opening session There is need to cultivate a balanced ecosystem of data and information value chain. Data, Information and Scientific Visualization provides a mirror for the supply side and lens for the demand side. VizAfrica 2018 Visualization Symposium is the beginning of one of the best approaches to better understand causality in Manufacturing and Industry; Policy, Regulation and Strategic Management; Logistics and Supply Chain Management; Universal Healthcare; Media; Agriculture, Food and Nutrition Security”.

Key note Speeches

Prof. Xiaoru Yuan, from the Peking University in China gave a keynote speech titled ‘Data visualization for Everyone’ which demonstrated how visualization is in the middle of technology and humans. He referred to data visualization conferences held in Kyoto, Beijing and now Nairobi to emphasize how visualization is happening globally. Data visualization is relevant to researchers example in dissemination of scientific work and is also relevant to citizens in traffic jam management, discussion maps, and flood mapping. The presentation noted the existence of data visualization for non-programmers.

‘Data Visualization in the Context of World Food Programme (WFP)’ by Adrian Van Der Knap analysed WFP’s experience with data in their various countries of operations alongside how WFP collects data. Adrian emphasized that better data fuels better decision making resulting into better relationships during operations; and to this end, WFP has developed a strategy to be data driven in their decision making. The presentation noted that WFP currently uses the Optimus system to help them optimize the food basket during planning and food distribution.

Prof. Koji Koyamada from Kyoto University in Japan made a presentation titled ‘Data visualisation for Better Understanding of causality’ which acknowledged that we are living in an era of big data which demands utilisation of visualization research techniques like information visualization and graphic design including in causality which explain relationship between cause and effect, for example in the field of brain science, fluid science, communication, earth science etc. Prof. Koyamada noted that data visualization has been added to the traditional research methodology protocol.

Track Presentations

Manufacturing and Industry Track

‘A GIS based Intelligent Transportation System for Traffic Incident Management’ by Khadiala Ligono Lisah illustrated the mapping of traffic GIS data to automatically manage traffic as an efficient way of managing traffic incidents. Near real-time data was used for this research. Non-linear SVM was used for data transformation.

‘Driving Behavior Analysis Based on Vehicle OBD II Information and Location Analytics’ by Accadeius Benard Sabwa  involved a practical demonstration on how OBD devices are used to manage and clear engine errors through data provided by the different sensors, including information and location data. In China, OBD devices are used by insurance companies to rate drivers if they are good or bad. Most vehicles especially German cars collect 3D data, particularly when cornering for stability. Forecasting for a car has many variables that are not constant while predictive models are based on previous driving data. Solutions for rallying cars already exist.

‘Revolutionizing the Manufacturing Industry using Business Intelligence Technology’ by Barry Okwaro provided a framework for Business Intelligence (BI) using different technologies for different areas of the framework so as to provide more flexibility and avoid vendor lock in. This approach differs from integrated software solutions e.g. SAP as it discourages buying off-the-shelf software with full integration and develop customized approaches based on the customer needs and available open source software. He noted that the kind of visualization tools and applications required for BI in manufacturing often depends on the type and nature of applications and data e.g. Tableau, PowerBI, SSRS, Sisense, QLinkView. It was noted that the competitive advantage of BI in Manufacturing allows information gathered from competitors to be put in a data warehouse before performing analytics.

‘Collision Visualization of Laser-scanned Point Clouds presentation’ by Weite Li used the nature of Ofune-hoko procession simulation laser method as opposed to the polygon method. The approach compares with commercial software e.g. polygon technology used by commercial software as it uses polygon collision detection, the street data is a cloud based solution.  The presentation demonstrated how laser scanning can take data of a whole city, which is more accurate than polygon based methods.

Computer Graphics, Media Animation Track

Automatic Comprehension of Tweets Using Jumping Finite Automata’ by Stephen Obare,George Okeyo, Abejide Ade-Ibijola, Kennedy Ogala from JKUAT focused on the analysis, annotation and formalization of tweets using jumping finite automata specification and their importance with reference to finite automata to report on tweet variations.

‘Interactive Visualization System of Precipitation’s Probability by Using Percentage Area Graph’ by Lei Puwen  from Kyoto University, Japan delved in visualization of water level forecasting using deep learning with an aim of preventing the results of flash flooding in small rivers. Visualizing the results with time series of environmental factors were the main inputs. Linear regression was also used in the research. Real data was converted to binary data using Gauss distribution.

‘Separation of Overlapping Image Objects Using Morphological Operations’ by Patrick O. Ajwang – JKUAT focused on separating overlapping images using morphological operations and was aimed at mechanizing agriculture for identification of mature flowers through separation of overlapping objects through image acquisition- pre-processing-segmentation-feature extractions and classification. One defines the size and shape of extraction from source code, using neighbouring matrices, trial and error, however this is difficult for irregular shapes. The size of erosion for dilation can be automated instead of trial and error.

Constance M. Ngila and Catherine W. Wangari presentation titled ‘Color Impact on the Perception of the Emotions Portrayed in 3D Animations’ entailed capturing emotions from clients using colour intensities in videos thus different colours, contrasts and lighting. This was aimed at studying their effects on mood and emotion. It was suggested that cultural beliefs, gender and age should also be considered in the study alongside other data collection methodologies and try automating them.

Yuki Ueno from Kyoto University, Japan presentation on ‘Classification of Task Performance during an Evaluation of Visualization method based on physiological Signals’ considered EEG, pattern of brain wave and eye blink in young adults. Future work will include how heart rate is affected when a task takes very long or short time to complete.

‘Realtime DHIS2 Data Capture, Integration and Operability as a Potential Driver towards Health Data Analytics, Health Intelligence and Visualisation for Mother Child Health Data in Kenya’ by Sarah Waiganjo , Muliaro Wafula, Simon Karanja from JKUAT focused on challenges in capturing data at health facilities in public hospitals. Most data is keyed in registers and aggregated later causing delays in decision making. It emphasised the need to provide real time accurate data.

‘A Computational Evaluation of Eye-track Measures in Group-in-A-Box’ by Nozomi Aoyama Yuki Ueno, Koji Koyamada from Kyoto University, Japan emphasized the need of effective use of good visualization although it is difficult to know if a good layout is good. The presentation aimed at making a guideline for referring to eye track measures effectively to enable researchers reach findings easily and quickly. The analysis is used for eye tracking data as a confirmation and requirement for further analysis- visual analytics, computational methods.

 ‘A method for Extracting Data Points from an Image of a Plotted Graph’ by Lincoln Kamau, Philip Kibet, Christopher Maina, Robert Macharia from JKUAT, Kenya noted that data in graphs can be deceiving. To address this data is converted to greyscale using a threshold value followed by detecting the value plot and scaling the detected values depending on the quality of the graph in use. Accuracy is tested through manual testing that involves use of original data and output data of experiment. This was done on black and white images.

‘Visualization of  Tsunami Simulation Data using Multi-dimensional transfer functions in HSVA Color Space’ by Ikuya Morimoto, Satoshi Nakada, Kyoko Hasegawa, Liang Li, Satoshi Tanaka from Ritsumeikan University, Japan aimed at finding ways to minimize effects of Tsunami through visualizing tsunami in the Nankai Trough earthquake using fusion visualization and the amount of salinity change and flow velocity to save aqua-culture. The method used was the multi-dimensional transfer functions in HSVA colour space thus hue saturation and brightness as compared to hue and brightness which gives clearer pictures about a tsunami. This simulation can also be applied to rivers and lake.

A Big Data Analytics and Visualization Model for Enhancing Security Within Smart Cities: A Case Study of Nairobi Metropolitan Area’ by Geoffrey Wekesa Chemwa from JKUAT focused on the need for automatic facial recognition to supplement human efforts in this era of terrorism using Hadoop framework which is efficient considering that facial recognition library has advanced and evolving quickly.

The Psychological Perception of Matatu Graffiti on Passenger Attraction’ by Michael S. Wafula and Joseph J. Musakali – Moi University, Kenya highlighted the impact of graffiti on our minds and the acceptance of graffiti as an art. Graffiti and psychology are linked in that people usually form an image out of an image. Graffiti is a concept of persuasion in the transport industry. Results from the study showed that youth loved graffiti that mainly contained celebrities, political and religious pictures. Public transport vehicles in Kenya known as Matatu, their owners spend a lot of money to put graffiti while people like or dislike graffiti depending on their age, beliefs and the likes. Graffiti is a source of livelihood but can also be a menace hence the need for institutionalizing graffiti policy/law to promote graffiti in a regulated format.

Panel Discussions and Way Forward:

The VizAfrica symposium concluded with a closing ceremony in which it was agreed by acclamation that:

  • Visualization is diverse and we should all be part of it and nurture it so as to make each other better.
  • The VizAfrica symposium will be held annually.
  • A summer program for visualization lasting 2 to 3 weeks should be instituted. This can then progress into certificate, diploma, degree, post graduate and exchange programs.

Program Booklet

Keynote speakers booklet

Combined abstract

Vizafrica booklet

Virginia Murray: Candidacy for CODATA Executive Committee

This is the seventeen in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Virginia Murray Chuang is a candidate for the CODATA Executive Committee as an ordinary member. She was nominated by USA.

I qualified in medicine and joined Guy and St Thomas’s Hospital Poisons Unit and was appointed consultant medical toxicologist. In 1989 I started the Chemical Incident Research Programme and from 1995 was Director of the Chemical Incident Response Service. Here we supported emergency services and other partners for acute and chronic chemical incident response and developed evidence informed guidance for preparedness and incident management. In 2011 I was appointed as Head of Extreme Events and Health Protection, Public Health England and developed evidence base information and advice on flooding, heat, cold, volcanic ash, and other extreme weather and natural hazards events following being a first author of one of the chapters for the special IPCC report on extreme events and disasters [i] Currently I am  the Head of Global Disaster Risk Reduction (GDRR) for Public Health England, which has supported my role as a member and previous vice-chair of the UN International Strategy for Disaster Reduction (ISDR) Scientific and Technical Advisory Group from 2008-2017 where she was actively engaged in supporting the negotiations

Data is critical for the implementation of the recent synchronous adoption of the 2015 landmark UN agreements of the Sendai Framework for Disaster Risk Reduction 2015 – 2030[ii], the Sustainable Development Goals (SDGs) [iii], and the COP21’s Paris Climate Conference[iv]. It also imperative for the use of the WHO’s International Health Regulations 2005[v] and has created a rare but significant opportunity to build coherence across different but overlapping policy areas.  In my GDRR role  I have engaged with many science and technology partners in supporting the UNISDR STAG/ ICSU/IAP partnership to make the Sendia Framework for Disaster Risk Reduction 2015-2030 very reflective of data needs.

Disaster risk reduction requires a multi-hazard approach and inclusive risk-informed decision-making based on the open exchange and dissemination of disaggregated data, including by sex, age and disability, as well as on easily accessible, up-to-date, comprehensible, science-based, non-sensitive risk information, complemented by traditional knowledge;”

When considered together, these frameworks make for a more complete agenda to build resilience and take action in areas including health, climate and disaster risk reduction.  This integrated thinking will serve to strengthen existing risk frameworks for multi hazard assessments and aim to develop a dynamic, local, preventive and adaptive urban governance system at global, national, and local levels.

To do this we need to measure and manage data. The frameworks must be ‘strengthened [by] effective implementation and monitoring’ calling for ‘a data revolution, rigorous accountability mechanisms and renewed global partnerships’.[vi]

The benchmarking of countries’ performance against indicators linked to global agreements is a powerful way to engage governments and mobilise resources—no country wants to fall behind.[vii]

During 2017, CODATA initiated and led a discussion with data science groups and international scientific unions and associations about the timeliness of a major initiative on interdisciplinary data integration.  Meetings at the ICSU HQ in Paris in June 2017 and at the Royal Society of London in November 2017 produced a report and communiqué supporting a long-term initiative and outlining some of the essential issues to be addressed.   The key priorities for this initiative are to address data integration in support of major global challenges and to develop relevant data capacities across all the disciplines of science.

The CODATA initiative on interdisciplinary data integration is seeking to explore these challenges and opportunities in relation to three specific case studies in interdisciplinary research: infectious disease outbreaks, disaster risk and resilient cities.  I am the lead for the disaster risk case study and work very closely wiht the infectious diseases and Resilient Cities programmes and 1 want to continue to advocate for the these case studies as we move into the development of this programme over the next three years – and these developments are best summarised in this figure below:

To me these case studies provide a concrete focus for exploring the potential of interoperability and data integration through metadata alignment via CODATA. The Interoperability of Metadata Standards in Cross-Domain Science, Health, and Social Science Applications  has shown that standards are a vital tool enabling integration and semantic linking of data within and between disciplines.

However, standards tend to get developed and adopted within disciplines or application domains with little consideration of cross-discipline requirements and technologies, so data integration can often only be easily achieved within and between closely allied fields. For example:

  • Addressing global scientific challenges that depend on cross-discipline integration remains difficult. The challenge is to make cross-discipline data integration a routine aspect of data-driven science.
  • Metadata support data discovery, selection, access and use, and are critical for data integration.

I believe the commitment to the delivery of these pilots would benefit from active CODATA executive committee engagement.

More widely my current roles include being a member of the Integrated Research on Disaster Risk (IRDR) scientific committee,  co-sponsored by the International Science Council (ISC) and the United Nations Office for Disaster Risk Reduction (UNISDR)  and Co-Chair of IRDR’s Disaster Loss Data (DATA),  and  currently co-chair of CODATA’s  Linked Open Data for Global Disaster Risk Research . I am a member the UNSDSN Data for Sustainable Development, co-chair of the recently developed WHO Thematic Platform Health and Disaster Risk Management Research Network  and a visiting/honorary Professor at several universities including University College London (2013) and at the United Nations University International Institute for Global Health  (2017)

[i]     Murray, V., G. McBean, M. Bhatt, S. Borsch, T.S. Cheong, W.F. Erian, S. Llosa, F. Nadim, M. Nunez, R. Oyun, and A.G. Suarez, 2012: Case studies. In: Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation [Field, C.B., V. Barros, T.F. Stocker, D. Qin, D.J. Dokken, K.L. Ebi, M.D. Mastrandrea, K.J. Mach, G.-K. Plattner, S.K. Allen, M. Tignor, and P.M. Midgley (eds.)]. A Special Report of Working Groups I and II of the Intergovernmental Panel on Climate Change (IPCC). Cambridge University Press, Cambridge, UK, and New York, NY, USA, pp. 487-542.

[ii]    United Nations Office for Disaster Risk Reduction. Sendai Framework for Disaster Risk Reduction 2015-2030. 2015

[iii]    United Nations. Sustainable Development Goals. 2015.

[iv]    United Nations Framework Convention on Climate Change. The Paris Agreement. 2015. .

[v]    World Health Organization International health regulations (2005);jsessionid=CEE24A3C20CA531AF6EC44F2586BA885?sequence=1

[vi]    United Nations – The Road to Dignity by 2030: Ending Poverty, Transforming All Lives and Protecting the Planet Synthesis Report of the Secretary-General On the Post-2015 Agenda December 2014

[vii]    Maini, R., Law, R., Duque III, F., Balboa, G., Noda, H., Nakamura, S. and Murray V. Monitoring progress towards planetary health – International agreements must include appropriate indicators, published regularly. BMJ 2017;359:j5279

Tony Hey: Candidate for the CODATA Executive Committee and CODATA President

This is the fifteen in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Tony Hey is a candidate for the CODATA Executive Committee as CODATA President.  He was nominated by UK.

CODATA’s Mission

“CODATA exists to promote global collaboration to advance Open Science and to improve the availability and usability of data for all areas of research.  CODATA supports the principle that data produced by research and susceptible to be used for research should be as open as possible and as closed as necessary.  CODATA works also to advance the interoperability and the usability of such data: research data should be FAIR (Findable, Accessible, Interoperable and Reusable). By promoting the policy, technological and cultural changes that are essential to promote Open Science, CODATA helps advance ISC’s vision and mission of advancing science as a global public good” [1]


In my present position as the Chief Data Scientist of the UK’s Science and Technology Facilities Council (STFC), I am based at the Rutherford Appleton Laboratory (RAL), on the Harwell Campus near Oxford. The Harwell site hosts the Diamond Synchrotron, the ISIS Neutron Source and the UK’s Central Laser Facility.  My primary role is to support the university users of these large-scale experimental facilities at RAL in managing and analyzing their research data.  The users of these facilities now perform experiments that generate increasingly large and complex datasets which need to be curated, analyzed, visualized and archived, and their new scientific discoveries published in a manner consistent with the FAIR principles. In addition, I work with the Hartree Supercomputing Centre at the STFC’s Daresbury Lab near Manchester. The Hartree Centre works mainly with industry and supports their computer modelling and data science requirements.

I am therefore intimately acquainted with the challenges of open science and believe, thanks in part to the activities of CODATA, together with its fellow ISC organization, the World Data System (WDS) and now also with their younger partner, the Research Data Alliance (RDA), that the global scientific research community has made significant progress towards the goals of the CODATA mission over the last five years. However, there is still much more to do before we can realize anything close to the Jim Gray’s vision of the full text of all publications online and accessible, linked to the original datasets with sufficient metadata that other researchers can reuse and add new data to generate new scientific discoveries. In his last talk before he went missing at sea, he summed up this vision in the ‘pyramid’ diagram below [2]:

The European Open Science Cloud (EOSC) has a similar vision but is aiming to provide a much more detailed roadmap towards realizing a vision of global research that is Findable, Accessible, Interoperable and Reusable (FAIR). The work of the EOSCpilot project to define a core set of metadata properties – the EOSC Dataset Minimum Information or EDMI – that are “sufficient to enable data to be findable by users and suitably ‘aware’ programmatic services” is a good start [3]. The Australian Research Data Commons (ARDC) established in 2018, subsuming the Australian National Data Service (ANDS), also has a similar vision.

My Vision for CODATA

I very much support the three major strategic programs put forward in CODATA’s Strategic Plan 2013 – 2018, namely:

  • Data Principles and Practice
  • Frontiers of Data Science
  • Capacity Building

However, given the promising developments of the last five years it is now time to develop a third strategic plan covering the next five years of the CODATA organization. Development of this new strategic plan must be a major priority for CODATA and it will be important to reach out to all the relevant national and international stakeholder organizations for their input. However, in addition to CODATA’s traditional stakeholders, I would also like to learn from the experience of other major efforts in this space. For example, from the US, this could include input from the NIH’s National Library of Medicine, the DOE’s OSTI organization and the NSF’s DataONE project. From Europe, there will be much activity in creating an implementation of the European Open Science Cloud (EOSC). I would also look for input from other major data science initiatives in Asia and Australia.

In addition to developing detailed plans and deliverables for the three broad CODATA priority areas for the next five years, I would like to give my support to two other areas. During my career in data-intensive science – in the UK with e-Science and in my work with Microsoft Research in the US – I have worked closely with universities and funding agencies in Europe, North and South America, Asia and Australia. I now think it is important to dedicate more attention to Africa where I think CODATA can play a significant role. I am therefore personally very supportive of the existing CODATA initiative to develop an African Open Science Platform and would look for ways to extend this initiative and increase its impact. One way in which to do this is to harness CODATA’s global reach and influence which can successfully bring together countries at many different levels of economic development. The international SKA project will also generate many interesting computing, data science and networking challenges in Africa.

The second focus I would like to develop is related to my present role as leader of the Scientific Machine Learning research group at RAL. There is now much activity world-wide in the application of the latest advances in AI and Machine Learning technologies to scientific data. This is one of the few areas where the academic research community has large and complex data sets that can compete with the ‘Big Data’ available to industry. Extracting new scientific insights from these datasets will require the use of advanced statistical techniques, including Bayesian methods and ‘deep learning’ technologies. In addition, an extensive education program to train researchers in the application of these data analytic technologies will be necessary and can build upon practical experience in applying such methods to ‘Big Scientific Data.’ In this way CODATA can help train a new generation of data analysts who are not only able to generate new insights from scientific data but also to spur innovation with industry and aid economic development.

While at Microsoft Research, I was a founding Board member of the RDA organization. As an RDA  Board member, I liaised extensively with both the NSF in the USA, and with the Commission in Europe, and assisted in facilitating the constructive cooperation of RDA with CODATA. I will therefore bring extensive management experience to the leadership of CODATA – from my experience in the university sector as research group leader, department chair and dean of engineering, in UK research funding councils as a program director and chief data scientist, and in industry as manager of a globally distributed outreach team. I am disappointed to see the absence of many European countries from the CODATA membership and, through my experience in European research projects, I would seek to encourage these missing nations to become members of the organization. In addition, in my role at Microsoft Research, I spent considerable time visiting universities and funding agencies in Central and South America, and in Asia. I believe there is considerable potential to interest non-member countries in these regions in the relevance of the data science agenda of CODATA. Finally, although I will certainly bring my vision, enthusiasm and energy to the role of CODATA President, I believe that we must harvest the energy and enthusiasm of the entire CODATA community to take the organization forward to a new level of influence and effectiveness.


My Background

I am standing for election to the CODATA Presidency because I have long been an advocate for Open Access and Open Science. My passion for this topic and for the era of ‘Big Scientific Data’ dates back to the years from 2001 to 2005 when I was director of the UK’s eScience program. With Anne Trefethen, I wrote a paper in 2003 with the title “The Data Deluge: An e-Science Perspective”. This paper was certainly one of the earliest papers to talk about the transformative effects on science of the imminent deluge of scientific data [4]. In 2006, I was invited to give a keynote talk on eScience at the CODATA Conference in Beijing. While a Vice President in Microsoft Research, we celebrated the achievements of my late colleague, the Turing Award winner Jim Gray, by publishing a collection of essays in 2009 that illustrated the emergence of a new ‘Fourth Paradigm’ of Data-Intensive Science [4].

During the eScience program, which received significant funding from both the UK Research Councils and from Jisc, the UK research community explored many issues about the scientific data pipeline that are still important and relevant today. One project, for example, examined the preservation and sharing of scientific workflows. Another project looked in detail at recording the provenance of a dataset. This effort ultimately led in 2013 to the emergence of the W3C ‘PROV’ standard for provenance. Several other eScience projects explored the use of RDF and semantic web technologies such as OWL and SPARQL for enhancing research metadata. Although these technologies have proved popular with several academic research communities, it is probably fair to say that they have not so far been broadly adopted by most research communities nor by the major IT companies. In my role as chair of Jisc’s research committee, I supported the establishment of the Digital Curation Centre (DCC) in Edinburgh in 2004. The DCC was one of the first organizations to propose a set of guidelines for scientific data management plans (DMPs). The Jisc research committee also funded the National Centre for Text Mining (NaCTeM) in Manchester which offers a broad range of text-mining services. In the age of ‘Big Scientific Data’, high-bandwidth, end-to-end networking performance is an increasingly necessary element of a nation’s e-infrastructure. As a result, the Jisc research committee funded Janet, the UK’s NREN, to follow the lead of SURFnet in the Netherlands by introducing optical fibre ‘lambda’ technology.  Janet can now provide dedicated ‘lightpath’ support to users requiring long-term, persistent high-volume data transfers between locations. I believe that these e-Science examples still have relevance to CODATA and to the practice of data science today.

Progress towards Open Science, Open Access and ‘Open’ Data

Open science must start with genuine open access to the full text of research papers. Before becoming Director of the UK’s eScience program, I was Head of Department of the Electronics and Computer Science (ECS) Department and then Dean of Engineering at the University of Southampton. Recognizing the crisis that university library budgets were facing in terms of rising journal subscriptions, with the support of Wendy Hall, Stevan Harnad and Les Carr, the ECS Department funded and developed the well-known ePrints repository software and established one of the first ‘Green Open Access’ institutional repositories. In the UK, there is now wide-spread deployment of university research repositories that contain the full text of research papers, albeit with access usually subject to a publisher embargo period of 6 or 12 months.

By contrast, in the US, a historic memo from the White House Office of Science and Technology Policy (OSTP) in 2013 required US funding agencies “to develop a plan to support increased public access to the results of research funded by the Federal Government.” More importantly for CODATA’s agenda, the memo also specified that the ‘results of research’ include not only the scientific research papers but also the accompanying research data. It defined research data as “the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.”

The OSTP memo has led to all the major US funding agencies developing open science policies and establishing research repositories that contain the full text of research papers linked to the corresponding datasets generated by the researchers that they fund. The two most prominent repository systems are the National Institutes of Health’s PubMed Central with its associated databases, and the Department of Energy’s PAGES system managed by the Office of Scientific and Technical Information (OSTI). In contrast to this US funding agency centred view, UK Research Councils now require all researchers to have a Data Management Plan in their research proposals and look to the universities and specialist subject repositories to be responsible for the outputs of research that they fund. For example, one Council, the Engineering and Physical Sciences Research Council, now requires that “Research organizations will ensure that EPSRC-funded research data is securely preserved for a minimum of 10 years from the date that any researcher ‘privileged access’ period expires”.

The developments described above have taken place over the last five years and constitute significant progress toward open science. I am therefore optimistic that CODATA, together with WDS and RDA, and supported by national and international research funding agencies, can continue to make major strides towards changing the culture of researchers about their research data. My optimism is further fuelled by the steady increase in research registrations for ORCID IDs and DOIs for datasets and software by university researchers.

Two very recent developments are also exciting. The first is the announcement of Science Europe’s ‘cOAlition’ for open access. Eleven European research funding agencies have agreed to focus their open science efforts on the very ambitious ‘Plan S’ – which aims at ‘Making Open Access a Reality by 2020’ [5]. The second notable development is Google’s introduction of a new Dataset Search service which has the potential to become a significant aid to data discoverability. The service makes use of the industry supported ‘’ initiative which aims to add some semantic information to the metadata describing the dataset.

 “ is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model” [7].

The recent work by the ELIXIR collaboration on the Bioschemas extension to is intended to improve data interoperability in life sciences research. Bioschemas is a collection of specifications that provide guidelines to facilitate a more consistent adoption of markup within the life sciences [8]. Such initiatives and  are important indicators of the direction of open science.  I therefore believe that a pragmatic approach to machine actionable metadata that is based on and subject-specific extensions represents a practical way forward for the majority of scientific research communities.

Tony Hey
Rutherford Appleton Laboratory
Science and Technology Facilities Council
Harwell Campus
Didcot, OX11 0QX, UK


[1] CODATA Mission statement,

[2] “The Fourth Paradigm: Data-Intensive Scientific Discovery”, edited by Tony Hey, Stewart Tansley and Kristin Tolle, published by Microsoft Research, 2009, ISBN: 978-0-9825442-0-4

[3] EOSCpilot, D6.3: 1st Report on Data Interoperability: Findability and Interoperability,

[4] Tony Hey and Anne Trefethen.  “The Data Deluge: An e-Science Perspective”, Chapter in “Grid Computing – Making the Global Infrastructure a Reality”, edited by F Berman, G C Fox and A J G Hey, Wiley, pp.809-824 (2003)

[5] Plan-S,

[6] Google Dataset Search,


[8] Bioschemas,


Tyng-Ruey Chuang: Candidacy for CODATA Executive Committee

This is the fourteen in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Dr. Tyng-Ruey Chuang is a candidate for the CODATA Executive Committee as an ordinary member. He was nominated by the Academy of Sciences located in Taipei.

I read, admire, and agree with the CODATA’s strategic priorities (as detailed in its 2015 report) on Data Principles and Practices, Frontiers of Data Science, and Capacity Building. I have been working for the last 15 years with researchers from multiple disciplines on data management systems, copyrights and public licenses, as well as open data policies. The goal of these collaborations, always, is to make better use of research data. My training and experience in information science and engineering aligns strongly with the CODATA priorities.

In the past few years, I have collaborated with the Taiwan Endemic Species Research Institute on a communal data workflow for the Taiwan Roadkill Observation Network [1]. The result was presented at SciDataCon 2016 [2] and the dataset deposited to GBIF for wide reuse [3]. I have worked with memory institutions on setting up the Sunflower Movement Archive [4]. The result was reported at Digital Humanities 2017 [5]. Both collaborations emphasize building up the necessary frameworks for community involvement, as well as the use of Creative Commons Licenses to facilitate public access to research materials.

I had been the public lead of Creative Commons Taiwan since its beginning in early 2003 until its transition to a community project in June 2018. I was a co-PI of the Open Source Software Foundry (2003 – 2017). These two long-running projects were supported by Academia Sinica in Taipei to outreach to the general public, researchers, and policy makers in Taiwan about the principles and practices of public licenses and free software. Capacity building is an integral part of the two projects.

Currently I am a member of CODATA’s International Data Policy Committee, and a co-chair of CODATA’s task group on Citizen Science and Crowdsourced Data. It has been a honor working with CODATA colleagues in these endeavors. The experience rather confirms my view that capacity building in data principles and practices is an urgent issue for many research institutions.

I am a part of CODATA Taiwan, and once served as its executive secretary (2007 — 2013). I have participated in CODATA General Assembly since 2008, and have organized sessions in the 2010 and 2012 CODATA International Conference, and in the 2014, 2016, and 2018 SciDataCon Conference. The 2012 CODATA International Conference was held in Taipei; I led a local team working with the CODATA Secretariat to organize the conference to a great success.

I am an associate research fellow at the Institute of Information Science, Academia Sinica, Taipei, with a joint appointment at both the Research Center for Information Technology Innovation and the Research Center for Humanities and Social Sciences. I was a fellow at the Berkman Center for Internet and Society, Harvard University, supported in part by a Fulbright senior research grant (2011 — 2012). I am currently a member of the Creative Commons’ Policy Advisory Council (2016 — ). I served, for several times, as a board member of the Taiwan Association of Human Rights, and as a board member of the Software Liberty Association of Taiwan.

[1] <>

[2] <>

[3] <>

[4] <>

[5] <>

Alena Rybkina: Candidate for the CODATA Executive Committee and CODATA Vice President

This is the thirteen in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Alena Rybkina is a candidate for the CODATA Executive Committee as CODATA Vice President.  She was nominated by Russia, IUGG.

Dr. Alena Rybkina is the deputy director of the Geophysical center of the Russian Academy of Sciences (GC RAS).

She has been serving on the position of the CODATA executive committee member since 2014, secretary general of the Russian National committee and co-opted member of the Union Commission on Data and Information (UCDI). She was an active member of the CODATA Task Group “Earth and Space Science Data Interoperability” and co-authored the “Atlas of the Earth’s Magnetic Field”.

She took part in a number of the international and national projects including programs of Ministry of education and science, Russian scientific foundation, Foundation for basic research. She is an active member of the RAS Committee on System Analysis that serves as the Russian NMO of the International Institute for Applied Systems Analysis.

She is experienced in the organization of international and national events devoted to promotion of data science in Russia and other countries. In particular she was the principal organizer of the conferences “Electronic Geophysical Year: State of the Art and Results” in 2009, Pereslavl-Zalessky, “Artificial Intelligence in the Earth’s Magnetic Field Study. INTERMAGNET Russian Segment” in 2011, Uglich. “Geophysical Observatories, Multifunctional GIS and Data Mining” in 2013, Kaluga. “Data Intensive System Analysis for Geohazard Studies” in 2016, Sochi. In 2017 she initiated CODATA regional conference “Global challenges and data-driven science”. The Conference brought together leading data scientists, data managers and specialists as well as Big Data experts from more than 35 countries. Such international event provides higher visibility of the existed studies and face the community with the new goals. Growing utilitarian importance of science diplomacy is reflected in various international science activities and CODATA international conference in Saint Petersburg played important role in this dimension.

For the last decades, Alena has been working in the field of data collection, data mining and visualization. She a specialist in implementation of modern information and visualization technologies in the research and industrial domain. Among principal goals is the development of the spherical projection system and software aimed at visualization of various geo data sets and popularization of the Earth sciences and its implementation within scientific and educational organizations in Russia and abroad. Her research background is geology with the focus on the reconstruction of the paleo environment. She took part in geological expeditions in Russia, Ukraine, France and Italy to collect geomagnetic data.

On the position of the Vice-President she will be aimed in promotion of efficient global collaboration for improved knowledge, understanding of the earth system and sustainable development. Her experience in scientific management will help to build an effective system for integrating and managing research needs. She will focus on the visibility of current and future CODATA projects through global research community. Among the principal goals is an involvement of a new national members and support from stakeholders. Effective data science dialog should be established between nations and continents and CODATA should be recognized as a new platform for future collaboration.

The merger of ICSU and the ISSC require a strategic overview of the current CODATA activities to build an effective system for global collaboration in data science.

Pamela Maras: Candidacy for CODATA Executive Committee

This is the twelveth in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Pamela Maras is a candidate for the CODATA Executive Committee as an ordinary member.  She was nominated by IUPsyS.

Professor Pam Maras (CSci, CPsychol, FBPS)

Professor Pam Maras is the President of Union of Psychological Science (IUPsyS) which is a full member of the International Science Council.

CODATA is important as a scientific committee of ICSU in promoting “the effective exploitation of data as the single most important international issue of “policy for science””. CODATA is in a unique position at the pivotal tine of the inaugurated International Science Council (ISC) as the global body to represent international science in all its forms in the promotion and dissemination of science. The ethical and open access of ‘big data’ including for public good is essential in the fast changing environment and can only really be achieved through geographic and disciplinary collaboration, that includes all areas of the science community (represented in ISC) and in all regions of the world.  A challenge for us, is to ensure that scientists collectively ‘buy in’ to processes including for data that are less easy to curate generated from the social sciences. It is this area that Professor Maras would offer expertise.

As a psychologist Professor Maras’ contribution if elected would be in relation to human behavior; both of the scientists likely to draw on ‘big data’ and the outcomes of research drawing on ‘big data’ including in areas of interdisciplinary relevance.  Professor Maras has expertise directly relevant to the impact of large data, and the development and implementation of policy and its adoption in an ethical manner. This can only really be effectively and achieved with integrity if common process for curation, storage and inclusion are not only designed but adopted; the latter is likely to be as hard or harder than the former and requires a shared understanding and commitment to act which can only be achieved by cooperation and agreed compromise.

Pam Maras is Professor in Social and Educational Psychology at the University of Greenwich, London, U.K., where she holds a senior leadership position (including as Chair of the University Ethics Committee and in international collaborations). She researches and publishes in the applied area of social inclusion; particularly in relation to children and young people’s self-concept, social identity, learning and behavior across the world. Her publications included in the UK national assessment of research excellence in 2014 (Research Excellence Framework, 2014) were independently rated as internationally excellent or outstanding. She has attracted considerable personal research funding and has research collaborations including in Africa, Australasia, China, Europe (including France, Nederland, Spain and Italy), the Nordic Countries (including Norway) North and Latin America and SE Asia.

Professor Maras has international leadership experience outside of her employed post, she has  held elected positions in the British Psychological Society (BPS) including as President where she  led the portfolio for international links, during which time she forged links with other associations leading to memoranda of understanding as a means of ensuring collective activity in Europe and more widely. As a member of the IUPsyS leadership team for international capacity building for eight years.  Professor Maras has taken a principled approach to the involvement of geographic regions in setting their own agenda. This has included work in Eastern Europe, the ASEAN region, the Caribbean and Latin America, having been involved in activity leading to declarations of regional collaboration in the Caribbean and Africa.

My new experience in Italy at the CODATA-RDA Research Data Science Summer School

This post was written by Neema Mduma. Neema recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy. Her participation was kindly supported by AFDB.

This post is a syndicated copy of the one at

My PhD journey has been great so far, apart from the sleepless nights (totally worth it though!). Last year I attended different events in USA, it was a great exposure, and I enjoyed both the scientific and the social programmes. I was looking forward to find new opportunities and travel elsewhere to learn new experience and extend my existing network.In June, 2018 I received an invitation to attend the CODATA-RDA Research Data Science Summer School which was held at the Abdus Salam International Centre of Theoretical Physics (ICTP) in Trieste, Italy. The summer school was held from 6th to 17th August, 2018 with the aim on building competence in data analysis and security for participants from all disciplines and backgrounds from Sciences to Humanities.The level of engagement and interaction between participants and instructors in this summer school was outstanding, helpers were always there to provide technical assistance. I was exposed to useful Machine Learning techniques that I will apply in my ongoing study. The Executive Director of CODATA, Simon Hodson presented to us various opportunities such as CODATA journal and many others.

This summer school gave me an opportunity to extend my networks with other academics and experts in the field of Machine Learning. Additionally, I had a chance to experience new culture and explore new places like Rome, Venice and Ljubljana. Sadly, I was the only participant from Tanzania, so I encourage my fellow Tanzanians to apply for calls and seize opportunities in Data Science workshops and summer schools. Lastly, I would like to thank the organisers of the summer school for making it a great success, and the African Development Bank (AfDB) for the financial support.