Monthly Archives: October 2018

Tony Hey: Candidate for the CODATA Executive Committee and CODATA President

This is the fifteen in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Tony Hey is a candidate for the CODATA Executive Committee as CODATA President. He was nominated by UK.

CODATA’s Mission

“CODATA exists to promote global collaboration to advance Open Science and to improve the availability and usability of data for all areas of research. CODATA supports the principle that data produced by research and susceptible to be used for research should be as open as possible and as closed as necessary. CODATA works also to advance the interoperability and the usability of such data: research data should be FAIR (Findable, Accessible, Interoperable and Reusable). By promoting the policy, technological and cultural changes that are essential to promote Open Science, CODATA helps advance ISC’s vision and mission of advancing science as a global public good” [1]

Preface

In my present position as the Chief Data Scientist of the UK’s Science and Technology Facilities Council (STFC), I am based at the Rutherford Appleton Laboratory (RAL), on the Harwell Campus near Oxford. The Harwell site hosts the Diamond Synchrotron, the ISIS Neutron Source and the UK’s Central Laser Facility. My primary role is to support the university users of these large-scale experimental facilities at RAL in managing and analyzing their research data. The users of these facilities now perform experiments that generate increasingly large and complex datasets which need to be curated, analyzed, visualized and archived, and their new scientific discoveries published in a manner consistent with the FAIR principles. In addition, I work with the Hartree Supercomputing Centre at the STFC’s Daresbury Lab near Manchester. The Hartree Centre works mainly with industry and supports their computer modelling and data science requirements.

I am therefore intimately acquainted with the challenges of open science and believe, thanks in part to the activities of CODATA, together with its fellow ISC organization, the World Data System (WDS) and now also with their younger partner, the Research Data Alliance (RDA), that the global scientific research community has made significant progress towards the goals of the CODATA mission over the last five years. However, there is still much more to do before we can realize anything close to the Jim Gray’s vision of the full text of all publications online and accessible, linked to the original datasets with sufficient metadata that other researchers can reuse and add new data to generate new scientific discoveries. In his last talk before he went missing at sea, he summed up this vision in the ‘pyramid’ diagram below [2]:

The European Open Science Cloud (EOSC) has a similar vision but is aiming to provide a much more detailed roadmap towards realizing a vision of global research that is Findable, Accessible, Interoperable and Reusable (FAIR). The work of the EOSCpilot project to define a core set of metadata properties – the EOSC Dataset Minimum Information or EDMI – that are “sufficient to enable data to be findable by users and suitably ‘aware’ programmatic services” is a good start [3]. The Australian Research Data Commons (ARDC) established in 2018, subsuming the Australian National Data Service (ANDS), also has a similar vision.

My Vision for CODATA

I very much support the three major strategic programs put forward in CODATA’s Strategic Plan 2013 – 2018, namely:

Data Principles and Practice
Frontiers of Data Science
Capacity Building

However, given the promising developments of the last five years it is now time to develop a third strategic plan covering the next five years of the CODATA organization. Development of this new strategic plan must be a major priority for CODATA and it will be important to reach out to all the relevant national and international stakeholder organizations for their input. However, in addition to CODATA’s traditional stakeholders, I would also like to learn from the experience of other major efforts in this space. For example, from the US, this could include input from the NIH’s National Library of Medicine, the DOE’s OSTI organization and the NSF’s DataONE project. From Europe, there will be much activity in creating an implementation of the European Open Science Cloud (EOSC). I would also look for input from other major data science initiatives in Asia and Australia.

In addition to developing detailed plans and deliverables for the three broad CODATA priority areas for the next five years, I would like to give my support to two other areas. During my career in data-intensive science – in the UK with e-Science and in my work with Microsoft Research in the US – I have worked closely with universities and funding agencies in Europe, North and South America, Asia and Australia. I now think it is important to dedicate more attention to Africa where I think CODATA can play a significant role. I am therefore personally very supportive of the existing CODATA initiative to develop an African Open Science Platform and would look for ways to extend this initiative and increase its impact. One way in which to do this is to harness CODATA’s global reach and influence which can successfully bring together countries at many different levels of economic development. The international SKA project will also generate many interesting computing, data science and networking challenges in Africa.

The second focus I would like to develop is related to my present role as leader of the Scientific Machine Learning research group at RAL. There is now much activity world-wide in the application of the latest advances in AI and Machine Learning technologies to scientific data. This is one of the few areas where the academic research community has large and complex data sets that can compete with the ‘Big Data’ available to industry. Extracting new scientific insights from these datasets will require the use of advanced statistical techniques, including Bayesian methods and ‘deep learning’ technologies. In addition, an extensive education program to train researchers in the application of these data analytic technologies will be necessary and can build upon practical experience in applying such methods to ‘Big Scientific Data.’ In this way CODATA can help train a new generation of data analysts who are not only able to generate new insights from scientific data but also to spur innovation with industry and aid economic development.

While at Microsoft Research, I was a founding Board member of the RDA organization. As an RDA Board member, I liaised extensively with both the NSF in the USA, and with the Commission in Europe, and assisted in facilitating the constructive cooperation of RDA with CODATA. I will therefore bring extensive management experience to the leadership of CODATA – from my experience in the university sector as research group leader, department chair and dean of engineering, in UK research funding councils as a program director and chief data scientist, and in industry as manager of a globally distributed outreach team. I am disappointed to see the absence of many European countries from the CODATA membership and, through my experience in European research projects, I would seek to encourage these missing nations to become members of the organization. In addition, in my role at Microsoft Research, I spent considerable time visiting universities and funding agencies in Central and South America, and in Asia. I believe there is considerable potential to interest non-member countries in these regions in the relevance of the data science agenda of CODATA. Finally, although I will certainly bring my vision, enthusiasm and energy to the role of CODATA President, I believe that we must harvest the energy and enthusiasm of the entire CODATA community to take the organization forward to a new level of influence and effectiveness.

My Background

I am standing for election to the CODATA Presidency because I have long been an advocate for Open Access and Open Science. My passion for this topic and for the era of ‘Big Scientific Data’ dates back to the years from 2001 to 2005 when I was director of the UK’s eScience program. With Anne Trefethen, I wrote a paper in 2003 with the title “The Data Deluge: An e-Science Perspective”. This paper was certainly one of the earliest papers to talk about the transformative effects on science of the imminent deluge of scientific data [4]. In 2006, I was invited to give a keynote talk on eScience at the CODATA Conference in Beijing. While a Vice President in Microsoft Research, we celebrated the achievements of my late colleague, the Turing Award winner Jim Gray, by publishing a collection of essays in 2009 that illustrated the emergence of a new ‘Fourth Paradigm’ of Data-Intensive Science [4].

During the eScience program, which received significant funding from both the UK Research Councils and from Jisc, the UK research community explored many issues about the scientific data pipeline that are still important and relevant today. One project, for example, examined the preservation and sharing of scientific workflows. Another project looked in detail at recording the provenance of a dataset. This effort ultimately led in 2013 to the emergence of the W3C ‘PROV’ standard for provenance. Several other eScience projects explored the use of RDF and semantic web technologies such as OWL and SPARQL for enhancing research metadata. Although these technologies have proved popular with several academic research communities, it is probably fair to say that they have not so far been broadly adopted by most research communities nor by the major IT companies. In my role as chair of Jisc’s research committee, I supported the establishment of the Digital Curation Centre (DCC) in Edinburgh in 2004. The DCC was one of the first organizations to propose a set of guidelines for scientific data management plans (DMPs). The Jisc research committee also funded the National Centre for Text Mining (NaCTeM) in Manchester which offers a broad range of text-mining services. In the age of ‘Big Scientific Data’, high-bandwidth, end-to-end networking performance is an increasingly necessary element of a nation’s e-infrastructure. As a result, the Jisc research committee funded Janet, the UK’s NREN, to follow the lead of SURFnet in the Netherlands by introducing optical fibre ‘lambda’ technology. Janet can now provide dedicated ‘lightpath’ support to users requiring long-term, persistent high-volume data transfers between locations. I believe that these e-Science examples still have relevance to CODATA and to the practice of data science today.

Progress towards Open Science, Open Access and ‘Open’ Data

Open science must start with genuine open access to the full text of research papers. Before becoming Director of the UK’s eScience program, I was Head of Department of the Electronics and Computer Science (ECS) Department and then Dean of Engineering at the University of Southampton. Recognizing the crisis that university library budgets were facing in terms of rising journal subscriptions, with the support of Wendy Hall, Stevan Harnad and Les Carr, the ECS Department funded and developed the well-known ePrints repository software and established one of the first ‘Green Open Access’ institutional repositories. In the UK, there is now wide-spread deployment of university research repositories that contain the full text of research papers, albeit with access usually subject to a publisher embargo period of 6 or 12 months.

By contrast, in the US, a historic memo from the White House Office of Science and Technology Policy (OSTP) in 2013 required US funding agencies “to develop a plan to support increased public access to the results of research funded by the Federal Government.” More importantly for CODATA’s agenda, the memo also specified that the ‘results of research’ include not only the scientific research papers but also the accompanying research data. It defined research data as “the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.”

The OSTP memo has led to all the major US funding agencies developing open science policies and establishing research repositories that contain the full text of research papers linked to the corresponding datasets generated by the researchers that they fund. The two most prominent repository systems are the National Institutes of Health’s PubMed Central with its associated databases, and the Department of Energy’s PAGES system managed by the Office of Scientific and Technical Information (OSTI). In contrast to this US funding agency centred view, UK Research Councils now require all researchers to have a Data Management Plan in their research proposals and look to the universities and specialist subject repositories to be responsible for the outputs of research that they fund. For example, one Council, the Engineering and Physical Sciences Research Council, now requires that “Research organizations will ensure that EPSRC-funded research data is securely preserved for a minimum of 10 years from the date that any researcher ‘privileged access’ period expires”.

The developments described above have taken place over the last five years and constitute significant progress toward open science. I am therefore optimistic that CODATA, together with WDS and RDA, and supported by national and international research funding agencies, can continue to make major strides towards changing the culture of researchers about their research data. My optimism is further fuelled by the steady increase in research registrations for ORCID IDs and DOIs for datasets and software by university researchers.

Two very recent developments are also exciting. The first is the announcement of Science Europe’s ‘cOAlition’ for open access. Eleven European research funding agencies have agreed to focus their open science efforts on the very ambitious ‘Plan S’ – which aims at ‘Making Open Access a Reality by 2020’ [5]. The second notable development is Google’s introduction of a new Dataset Search service which has the potential to become a significant aid to data discoverability. The service makes use of the industry supported ‘schema.org’ initiative which aims to add some semantic information to the metadata describing the dataset.

“Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model” [7].

The recent work by the ELIXIR collaboration on the Bioschemas extension to schema.org is intended to improve data interoperability in life sciences research. Bioschemas is a collection of specifications that provide guidelines to facilitate a more consistent adoption of schema.org markup within the life sciences [8]. Such initiatives and are important indicators of the direction of open science. I therefore believe that a pragmatic approach to machine actionable metadata that is based on schema.org and subject-specific extensions represents a practical way forward for the majority of scientific research communities.

Tony Hey
Rutherford Appleton Laboratory
Science and Technology Facilities Council
Harwell Campus
Didcot, OX11 0QX, UK

References

[1] CODATA Mission statement, http://www.codata.org/about-codata/our-mission

[2] “The Fourth Paradigm: Data-Intensive Scientific Discovery”, edited by Tony Hey, Stewart Tansley and Kristin Tolle, published by Microsoft Research, 2009, ISBN: 978-0-9825442-0-4

[3] EOSCpilot, D6.3: 1^st Report on Data Interoperability: Findability and Interoperability, https://eoscpilot.eu/sites/default/files/eoscpilot-d6.3.pdf

[4] Tony Hey and Anne Trefethen. “The Data Deluge: An e-Science Perspective”, Chapter in “Grid Computing – Making the Global Infrastructure a Reality”, edited by F Berman, G C Fox and A J G Hey, Wiley, pp.809-824 (2003)

[5] Plan-S, https://www.scienceeurope.org/wp-content/uploads/2018/09/cOAlitionS_Press_Release.pdf

[6] Google Dataset Search, https://ai.googleblog.com/2018/09/building-google-dataset-search-and.html

[7] Schema.org, https://schema.org/

[8] Bioschemas, http://bioschemas.org/

Tyng-Ruey Chuang: Candidacy for CODATA Executive Committee

This is the fourteen in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Dr. Tyng-Ruey Chuang is a candidate for the CODATA Executive Committee as an ordinary member. He was nominated by the Academy of Sciences located in Taipei.

I read, admire, and agree with the CODATA’s strategic priorities (as detailed in its 2015 report) on Data Principles and Practices, Frontiers of Data Science, and Capacity Building. I have been working for the last 15 years with researchers from multiple disciplines on data management systems, copyrights and public licenses, as well as open data policies. The goal of these collaborations, always, is to make better use of research data. My training and experience in information science and engineering aligns strongly with the CODATA priorities.

In the past few years, I have collaborated with the Taiwan Endemic Species Research Institute on a communal data workflow for the Taiwan Roadkill Observation Network [1]. The result was presented at SciDataCon 2016 [2] and the dataset deposited to GBIF for wide reuse [3]. I have worked with memory institutions on setting up the Sunflower Movement Archive [4]. The result was reported at Digital Humanities 2017 [5]. Both collaborations emphasize building up the necessary frameworks for community involvement, as well as the use of Creative Commons Licenses to facilitate public access to research materials.

I had been the public lead of Creative Commons Taiwan since its beginning in early 2003 until its transition to a community project in June 2018. I was a co-PI of the Open Source Software Foundry (2003 – 2017). These two long-running projects were supported by Academia Sinica in Taipei to outreach to the general public, researchers, and policy makers in Taiwan about the principles and practices of public licenses and free software. Capacity building is an integral part of the two projects.

Currently I am a member of CODATA’s International Data Policy Committee, and a co-chair of CODATA’s task group on Citizen Science and Crowdsourced Data. It has been a honor working with CODATA colleagues in these endeavors. The experience rather confirms my view that capacity building in data principles and practices is an urgent issue for many research institutions.

I am a part of CODATA Taiwan, and once served as its executive secretary (2007 — 2013). I have participated in CODATA General Assembly since 2008, and have organized sessions in the 2010 and 2012 CODATA International Conference, and in the 2014, 2016, and 2018 SciDataCon Conference. The 2012 CODATA International Conference was held in Taipei; I led a local team working with the CODATA Secretariat to organize the conference to a great success.

I am an associate research fellow at the Institute of Information Science, Academia Sinica, Taipei, with a joint appointment at both the Research Center for Information Technology Innovation and the Research Center for Humanities and Social Sciences. I was a fellow at the Berkman Center for Internet and Society, Harvard University, supported in part by a Fulbright senior research grant (2011 — 2012). I am currently a member of the Creative Commons’ Policy Advisory Council (2016 — ). I served, for several times, as a board member of the Taiwan Association of Human Rights, and as a board member of the Software Liberty Association of Taiwan.

[1] <https://roadkill.tw/en>

[2] <https://roadkill.tw/sites/roadkill/files/content/communal_data_workflow_in_tairon.pdf>

[3] <https://www.gbif.org/dataset/db09684b-0fd1-431e-b5fa-4c1532fbdb14>

[4] <http://public.318.io/>

[5] <https://dh2017.adho.org/abstracts/350/350.pdf>

Alena Rybkina: Candidate for the CODATA Executive Committee and CODATA Vice President

This is the thirteen in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Alena Rybkina is a candidate for the CODATA Executive Committee as CODATA Vice President. She was nominated by Russia, IUGG.

Dr. Alena Rybkina is the deputy director of the Geophysical center of the Russian Academy of Sciences (GC RAS).

She has been serving on the position of the CODATA executive committee member since 2014, secretary general of the Russian National committee and co-opted member of the Union Commission on Data and Information (UCDI). She was an active member of the CODATA Task Group “Earth and Space Science Data Interoperability” and co-authored the “Atlas of the Earth’s Magnetic Field”.

She took part in a number of the international and national projects including programs of Ministry of education and science, Russian scientific foundation, Foundation for basic research. She is an active member of the RAS Committee on System Analysis that serves as the Russian NMO of the International Institute for Applied Systems Analysis.

She is experienced in the organization of international and national events devoted to promotion of data science in Russia and other countries. In particular she was the principal organizer of the conferences “Electronic Geophysical Year: State of the Art and Results” in 2009, Pereslavl-Zalessky, “Artificial Intelligence in the Earth’s Magnetic Field Study. INTERMAGNET Russian Segment” in 2011, Uglich. “Geophysical Observatories, Multifunctional GIS and Data Mining” in 2013, Kaluga. “Data Intensive System Analysis for Geohazard Studies” in 2016, Sochi. In 2017 she initiated CODATA regional conference “Global challenges and data-driven science”. The Conference brought together leading data scientists, data managers and specialists as well as Big Data experts from more than 35 countries. Such international event provides higher visibility of the existed studies and face the community with the new goals. Growing utilitarian importance of science diplomacy is reflected in various international science activities and CODATA international conference in Saint Petersburg played important role in this dimension.

For the last decades, Alena has been working in the field of data collection, data mining and visualization. She a specialist in implementation of modern information and visualization technologies in the research and industrial domain. Among principal goals is the development of the spherical projection system and software aimed at visualization of various geo data sets and popularization of the Earth sciences and its implementation within scientific and educational organizations in Russia and abroad. Her research background is geology with the focus on the reconstruction of the paleo environment. She took part in geological expeditions in Russia, Ukraine, France and Italy to collect geomagnetic data.

On the position of the Vice-President she will be aimed in promotion of efficient global collaboration for improved knowledge, understanding of the earth system and sustainable development. Her experience in scientific management will help to build an effective system for integrating and managing research needs. She will focus on the visibility of current and future CODATA projects through global research community. Among the principal goals is an involvement of a new national members and support from stakeholders. Effective data science dialog should be established between nations and continents and CODATA should be recognized as a new platform for future collaboration.

The merger of ICSU and the ISSC require a strategic overview of the current CODATA activities to build an effective system for global collaboration in data science.

Pamela Maras: Candidacy for CODATA Executive Committee

This is the twelveth in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Pamela Maras is a candidate for the CODATA Executive Committee as an ordinary member. She was nominated by IUPsyS.

Professor Pam Maras (CSci, CPsychol, FBPS)

Professor Pam Maras is the President of Union of Psychological Science (IUPsyS) which is a full member of the International Science Council.

CODATA is important as a scientific committee of ICSU in promoting “the effective exploitation of data as the single most important international issue of “policy for science””. CODATA is in a unique position at the pivotal tine of the inaugurated International Science Council (ISC) as the global body to represent international science in all its forms in the promotion and dissemination of science. The ethical and open access of ‘big data’ including for public good is essential in the fast changing environment and can only really be achieved through geographic and disciplinary collaboration, that includes all areas of the science community (represented in ISC) and in all regions of the world. A challenge for us, is to ensure that scientists collectively ‘buy in’ to processes including for data that are less easy to curate generated from the social sciences. It is this area that Professor Maras would offer expertise.

As a psychologist Professor Maras’ contribution if elected would be in relation to human behavior; both of the scientists likely to draw on ‘big data’ and the outcomes of research drawing on ‘big data’ including in areas of interdisciplinary relevance. Professor Maras has expertise directly relevant to the impact of large data, and the development and implementation of policy and its adoption in an ethical manner. This can only really be effectively and achieved with integrity if common process for curation, storage and inclusion are not only designed but adopted; the latter is likely to be as hard or harder than the former and requires a shared understanding and commitment to act which can only be achieved by cooperation and agreed compromise.

Pam Maras is Professor in Social and Educational Psychology at the University of Greenwich, London, U.K., where she holds a senior leadership position (including as Chair of the University Ethics Committee and in international collaborations). She researches and publishes in the applied area of social inclusion; particularly in relation to children and young people’s self-concept, social identity, learning and behavior across the world. Her publications included in the UK national assessment of research excellence in 2014 (Research Excellence Framework, 2014) were independently rated as internationally excellent or outstanding. She has attracted considerable personal research funding and has research collaborations including in Africa, Australasia, China, Europe (including France, Nederland, Spain and Italy), the Nordic Countries (including Norway) North and Latin America and SE Asia.

Professor Maras has international leadership experience outside of her employed post, she has held elected positions in the British Psychological Society (BPS) including as President where she led the portfolio for international links, during which time she forged links with other associations leading to memoranda of understanding as a means of ensuring collective activity in Europe and more widely. As a member of the IUPsyS leadership team for international capacity building for eight years. Professor Maras has taken a principled approach to the involvement of geographic regions in setting their own agenda. This has included work in Eastern Europe, the ASEAN region, the Caribbean and Latin America, having been involved in activity leading to declarations of regional collaboration in the Caribbean and Africa.

My new experience in Italy at the CODATA-RDA Research Data Science Summer School

This post was written by Neema Mduma. Neema recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy. Her participation was kindly supported by AFDB.

This post is a syndicated copy of the one at https://neylicious.github.io/ml/2018/10/03/italy.html

My PhD journey has been great so far, apart from the sleepless nights (totally worth it though!). Last year I attended different events in USA, it was a great exposure, and I enjoyed both the scientific and the social programmes. I was looking forward to find new opportunities and travel elsewhere to learn new experience and extend my existing network.In June, 2018 I received an invitation to attend the CODATA-RDA Research Data Science Summer School which was held at the Abdus Salam International Centre of Theoretical Physics (ICTP) in Trieste, Italy. The summer school was held from 6th to 17th August, 2018 with the aim on building competence in data analysis and security for participants from all disciplines and backgrounds from Sciences to Humanities.The level of engagement and interaction between participants and instructors in this summer school was outstanding, helpers were always there to provide technical assistance. I was exposed to useful Machine Learning techniques that I will apply in my ongoing study. The Executive Director of CODATA, Simon Hodson presented to us various opportunities such as CODATA journal and many others.

This summer school gave me an opportunity to extend my networks with other academics and experts in the field of Machine Learning. Additionally, I had a chance to experience new culture and explore new places like Rome, Venice and Ljubljana. Sadly, I was the only participant from Tanzania, so I encourage my fellow Tanzanians to apply for calls and seize opportunities in Data Science workshops and summer schools. Lastly, I would like to thank the organisers of the summer school for making it a great success, and the African Development Bank (AfDB) for the financial support.

Jianhui LI: Candidacy for CODATA Vice President

This is the eleventh in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Jianhui LI is a candidate for the CODATA Executive Committee as a Vice President. He was nominated by China, PASTD, LODGD, USA.

Dr. Jianhui Li is the department director of Computer Network information Center (CNIC), Chinese Academy of Sciences (CAS), a Professor and PhD Supervisor at the University of Chinese Academy of Sciences, an Ordinary Member of the Executive Committee of CODATA (2014-2016,2016-2018). He has worked on data infrastructure, data management and data-intensive computing since 1999, and has led the scientific data infrastructure development and open data activities in CAS for more than 10 years.

He has been always promoted the implementation of open data principles, data policies and put them into real practices as well. In 2017, He launched China’s first national-level large-scale survey on data sharing, which helped MOST formulate the first national policy on scientific data management and openness in China. To nourish a local data sharing culture among the Chinese research community, he launched a bilingual open-access data journal China Scientific Data (www.csdata.org), together with the data repository ScienceDB(http://www.sciencedb.cn). Moreover, he is a very active data scientist and leader in advancing the frontiers of data sciences. He is in charge of the development of the CAS Scientific Data Cloud for Big Data Analysis and Large Scale Data-Intensive Scientific Research. He is now leading the development of one scientific big data management system funded by the National Key R&D Plan, and serves as co-chair of the technology working group in the CASEarth Programme(http://www.casearth.com ) – a CAS Strategic Pioneer Research and Development Programme mainly focusing on building a global big data network to study Earth and support research on climate change, as well as to predict and mitigate natural disasters.

He has severed as Secretary General of CODATA-China for 10 years and organized a serial of successful international and domestic activities, including China-US Roundtable on Scientific Data, Training workshop for Developing Countries on Scientific Data, and National Scientific Data Conference. He initiated the International Training Workshop for Developing Countries on Scientific Data, sponsored jointly by CAS, CODATA and CODATA-CHINA. The training workshop has been held four times in 2012, 2014, 2016 and 2017, which attracted more than 80 participants from over 20 developing countries involved. The Annual National Scientific Data Conference is another event initiated in 2014 and now is the most important national academic conference on scientific data in China, providing a friendly platform for exploring the frontiers of data science and exchanging knowledge and experiences among thousands of scientists. He has been working as a member of CODATA EC since 2014, and making contributions for CODATA Strategy and its implementation.

With his extensive experience in national and international data activities and outstanding research on data science, scientific data management and sharing, he will continue to help CODATA carry out its mission, objectives and key initiatives as articulated in the Strategic Plan, especially make contributions on data policy, data management and data science.

Ernie Boyko: Candidacy for CODATA Executive Committee

This is the tenth in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Ernie Boyko is a candidate for the CODATA Executive Committee as an ordinary member. He was nominated by Canada, Israel, South Africa, TGFC.

Ernie Boyko is an agricultural economist with extensive experience in data development, data dissemination and research data management.

His wide experience at Canada’s national statistics agency, Statistics Canada, involved working at senior levels in a variety of areas including agriculture statistics, corporate planning, electronic dissemination, census operations and library and information services.

As an advocate for data access, his crowning achievement at Statistics Canada was the creation of the Data Liberation Initiative with Wendy Watkins from Carleton University. This program allowed affordable access to all public microdata and aggregate files for the first time to post-secondary institutions. It has recently celebrated its 25^th anniversary with 79 institutional members. DLI has been cited internationally as a model program for statistical institutions.

Ernie was involved in several assignments with the World Bank and OECD’s PARIS21 projects. This involved work in several African and Asian counties in data development for agriculture and dissemination and data management policies for statistical agencies. He was able to put his Statistics Canada and international experience to use while spending a decade as an Adjunct Data Librarian at Carleton University where he taught the basics of research data management to faculty and graduate student researchers.

Mr. Boyko is a past president of the International Association for Social Science Information Services and Technology (IASSIST) an organization he has been part of for nearly 30 years. This has given him exposure to the challenges of social science data services. It was under this umbrella that he was part of a working group that developed a metadata standard, the Data Documentation Initiative (DDI). DDI is now widely used as a standard for documenting research data. In 2018, IASSIST presented Ernie with a lifetime achievement award for his work.

Ernie has been involved with CODATA for a decade, serving as an observer, a member and currently the Chair of the Canadian National Committee for CODATA. As chair, he is leading a project that will realign the emphasis of the national committee with the new CODATA constitution’s focus on research data management. The creation of the International Science Council through the merger of ICSU and the ISSC has led CODATA to focus more squarely on research data management and outreach. The goal of the current work of the Canadian committee is to develop a process that will allow reporting on the status of RDM in Canada in a way that facilitates international comparisons.

His varied experience stands Ernie in good stead to help meet the challenges faced by CODATA in its transitions to being part of the International Science Council. It is hoped that Canada’s RDM project will be of interest to other countries, thus enabling a broader understanding of the state of research data for the benefit of science.

Open Consultancy – Development of interactive tools to support the use of data collected by Major Groups and other Stakeholders to measure progress towards the Sustainable Development Goals (SDGs)

Background

The availability and access to high-quality, timely, and reliable data, disaggregated by relevant characteristics and supplemented with necessary contextual information for its interpretation and use, is fundamental to the successful implementation and monitoring of the 2030 Agenda. National Statistical Systems, however, face many challenges not only in producing statistics to fill data gaps in national, regional and global SDG indicator frameworks, but also in integrating and making the vast amounts of existing SDG-related data and information accessible to decision makers in a meaningful way.

Different actors are often able to contribute timely and disaggregated data on specific issues, geographies or groups, which can supplement and provide additional context to understand the data on official statistical indicators at the national and global levels (such as data on informal settlements or needs in specific communities). Therefore, several important initiatives are being undertaken by National Statistical Offices in coordination with other members of national statistical systems, and in partnership with international and regional organizations and stakeholders from civil society, academia, and the private sector, to integrate new data sources into the production and dissemination of official statistics for sustainable development.

The UN DESA Division for Sustainable Development Goals’ (DSDG) ongoing EU grant entitled “SD2015: delivering on the promise of the SDGs” seeks to strengthen and support Major Groups and other Stakeholders’ (MGoS) engagement, including by strengthening their capacity to monitor and contribute to the 2030 Agenda.

Major Groups and other Stakeholders play an important role in reporting on the implementation of the Sustainable Development Goals. MGoS are often able to contribute timely and disaggregated data on specific issues, geographies or groups, which can supplement and provide additional context to understand the data on official statistical indicators at the national and global levels.

Citizen-generated data, as well as data produced by such constituencies as the private sector and local authorities, faces complex challenges in being integrated and used to support monitoring and decision- making to accelerate implementation of the 2030 Agenda. In particular, it is often difficult to find and link these supplementary data sources, due to discrepancies in the metadata structures and vocabularies used to describe and organize their content, as well as the lack of adherence to common technical and statistical standards. These challenges counter disaggregation, and lead to lower visibility and use of existing data sources that could be leveraged by governments to help fill in monitoring gaps for the SDGs.

Currently, there is no common framework to structure, aggregate, and give visibility to innovative data sources which can be leveraged by policy makers, analysts, and the general public to help implement and monitor progress towards the SDGs.

Objectives

Building on the UN Statistics Division’s existing work in providing guidance towards the use of common data standards to monitor the SDGs, the project aims to provide a common framework and guidelines to improve the visibility, interoperability and usability of supplementary sources of data on sustainable development to complement the work of national statistical offices, consequently raising overall awareness on progress towards the SDGs. Use of semantic web tools will be employed to address the needs for a lightweight data-interchange format across the web

Work Assignment

The Consultant will perform the following duties:

Review existing prototypes of SDG data ontology and other related vocabularies (e.g., UN- BIS Thesaurus)
Draft a document with guidelines for the implementation of linked-data on statistical data provided through a web API, and for digital text exposed through web documents
Draft a document with general guidelines for the consumption of the SDG linked
Develop an application to pilot the collection and integration of multiple alternative data sources in one

Duration of contract

The proposed duration of the contract is for 3 months starting in November 2018.

Duty Station or Location of Assignment

The consultant is not required to work in a UN office, but must be available for regular phone and web conference meetings with DSDG, UNSD and project partners during office hours. Travel or commute time to and from the United Nations Headquarters, as well as related expenses, are not part of the consultancy.

Travel

The Consultant is not required to travel for the performance of the assignment.

Expected outputs*

The consultant will develop the following deliverables:

The following set of on-line guidelines (to be delivered in the form of GitHub wiki pages, Jupyter notebook(s) or similar on-line documents):
- Guidelines for the practical implementation of semantic web standards (JSON_LD, microdata or any other linked data artifact) to achieve the greatest exposure of SDG statistical data, using unique URIs defined in various publicly available ontologies and
- General guidelines for the consumption of the SDG linked (to be delivered in a word document)
An application to pilot the collection and integration of multiple alternative data sources with official SDG statistics in one This application will use the linked data infrastructure created to connect nontraditional data sources with official statistical data.

Delivery dates of output

First progress report submitted by 15 December 2018
Second progress report submitted by 15 January 2018
All outputs submitted by 15 February 2019.

Performance indicators

Timely delivery of all components specified;
Quality of the documentation and tutorials developed;
Effectiveness of solution provided on an architectural level;
Timely preparation of all activities and regular progress updates;

Qualifications

Fluency in English
An advanced degree in the field of data or information science or related fields
A minimum of five years of working experience on information science including experience in the use of linked data and semantic web
Demonstrated experience in the implementation of linked data projects using common linked data standards, such as RDFa, JSON_LD, Mircodata or other linked data technologies
Familiarity with the N. Sustainable Development Goals and monitoring framework
Knowledge of Agile methodology is desirable

Interested Applicants

Please send your CV, Cover Letter and financial proposal (daily fee) by 19 October 2018 to:

Ms. Nan Jiang, Division for Sustainable Development Goals, UN Department of Economic and Social Affairs jiang2n@un.org; +19173674426

Humans of Data 26

“What motivates me to keep going is teaching people who are going to keep this going after us. Managing data will find its place in the world. I don’t mean analytics, I mean taking care of the data so that people can run legitimate analytics.

It annoys me just now – a lot of these data science and data analytics programs, they’re all about statistics, visualisation, analysis, but very little about actually curating the data underneath. Not to say that data curators don’t need to know a little bit about analysis but people who do data science in the business environment, they often don’t know much about curation. People working for businesses, they complain that they spend 80% of their time cleaning data and without that, the data wasn’t usable. But I feel like saying, ‘If you hired data curators you wouldn’t have to deal with that problem!’”

Daisy Selematsela: Candidacy for CODATA Executive Committee

This is the ninth in the series of short statements from candidates in the forthcoming CODATA Elections at the General Assembly to be held on 9-10 November in Gaborone, Botswana, following International Data Week. Daisy Selematsela is a candidate for the CODATA Executive Committee as an ordinary member. She was nominated by PASTD TG.

Daisy Selematsela holds a PhD and is Professor of Practice of Information and Knowledge Management of the University of Johannesburg. She has a combined 27 years’ experience in the Higher Education sector and within the National System of Innovation (NSI). She serves as mentor for emerging researchers in interdisciplinary areas and an external examiner for undergraduate and postgraduate students in Library, Information Science and Knowledge Management.

Daisy serves on a number scientific bodies and also as an editorial board member of the South African Journal of Library and Information Science (SAJLIS) and the Global Change Research Data Publishing and Repository and a reviewer of several programs.

She serves on a number of national boards and Advisory Councils. Internationally she is a member of Board of Directors of ORCID (represent EMEA – East Asia, Middle East and Africa) and the Confederation of Open Access Repositories (COAR).

Daisy has served the then ICSU and CODATA on a number of forums, contributed to position papers, co-ordinated workshops, chaired conference sessions and made numerous local and international presentations on areas related to CODATA objectives. She has served CODATA in the following areas:

Data Science Journal Review – corresponding Editor 2009
Served as ex-officio member of the South African National Committee for CODATA for 11 years.
World Data Centre on Biodiversity and Human Health prototype proposal and hosting;
Executive member: International Council for Science Union (ICSU SCID) ad Hoc Committee on Information and Data in 2007.
Chair: International Council for Science: Committee on Data for Science & Technology (ICSU: CODATA) Task Group on Data Sources for Sustainable Development in SADC 2007 -2011.
Executive member: (ICSU EDC Panel) International Science Union World Data Centre Panel2008.
Member: CODATA Task Group on Preservation of and Access to Scientific and Technical Data in/for/with Developing Countries. Co-chairs: CODATA – WDS joint subgroup 2011 to date.

She was part of the Founding and Executive Members of the International Data Forum (IDF) 2007-2010. Instrumental in the formulation of Statement on Open Access for grant funding; Statement on ORCID ID and Predatory Publishing.

She holds a PhD in Information Science from the University of Johannesburg; a Fellow of the Higher Education Resource Service for Women in Higher Education (HERS) South Africa and Bryn Mawr College in Philadelphia, USA. Acknowledged with the Knowledge Management Award in 2016 by the World Education Congress.

CODATA Blog

News from the CODATA community and from Simon Hodson, CODATA Executive Director