Author Archives: codata_blog

Humans of Data 30

“The things that made me interested in data twenty-five years ago are the same things that make me interested in it now.  It’s the way structures and narratives are going to define our culture and who we are.  I was interested initially in the historical perspective: how is data going to change the subtlety about how we understand past cultures?  And how future generations are going to access data, manipulate it and study it.  Very few people were interested in that, and that was exciting.  Now I’m a bit scared of it.  It’s increasingly clear that we can use data in a contemporary context for social evils.

Data – and the systems that store it – can be ugly, but they can also be beautiful.  Some day, people are going to be interested in the beauty of system architectures and the beauty of database design – or the ugliness.  It’s like studying a Gothic cathedral or a contemporary city: the architecture defines how we feel and the narrative that plays out in our lives.

I see emerging possibilities in machine learning, in the blockchain and in other areas.  I share an understanding of the risks of data.  And I believe that what kind of research you actually do – and how you enable others to be creative to do things – these are equally important.  More focus needs to go on developing the community and helping our colleagues to progress and excel.  The ways forward in digital curation and data presentation are very much going to come from human collaboration and not from one lone person who’s thought of a technical solution that no-one has thought of before.  This is not the field of lone scholars – this is the field of general community effort.”

World Data System: Early Career Researcher Training Workshop 2019

https://www.flickr.com/photos/sbszine/9353420967/

Data Curation and Management: Current Achievements and Future Challenges

The management and curation of research data is a very timely topic. All researchers rely on data they have themselves collected or that are the outputs of previous studies. Moreover, researchers are increasingly required to organize the long-term storage and access of the data used to obtain their results. As such, data training is highly relevant to budding scientists as they embark on their careers!

With many thanks to the generous sponsorship of the European Geosciences Union (EGU), the World Data System of the International Science Council (WDS) is delighted to offer a Research Data Management (RDM) Training Workshop aimed at early career researchers and scientists (ECRs).

The Workshop will take place on 6–8 November 2019 at Institut de Physique du Globe in Paris, France. Twenty-four (24) seats are available, for whom dormitory accommodation (FIAP) and meals will be covered. There is also limited funding available towards some participants travel costs. If more people apply than there are places available, selection of participants by the WDS Scientific Committee(WDS-SC) will be based upon their interest in the Workshop.

Apply Now!

The Regional Master program in Biodiversity informatics opens registration to students

The master program in Biodiversity informatics has opened its doors at the Faculty of Agricultural Sciences of the University of Abomey-Calavi, Benin, in October 2017. It is at its third student batch from next academic year.

It is regional in that it receives students of several nationalities and national and international teachers (especially from the United States and Europe) come to teach in the program.

The objective of this Regional Master is to train specialists of a new generation in biodiversity, able to integrate threats to biodiversity to research issues to deduce relevant results to inform decision-making on biodiversity in the context of climate and global change.

Here is the link to the section of the communication and outreach on TV that clarified the master program (https://youtu.be/4ajSBF1Jdyk) (it is in French)

We invite you to join or massively enroll your staff in the program in order to meet the challenges of mobilizing and using biodiversity data for decision-making.

Applications are already in progress since the 1st June and we are closing on 30th June 2019.

The program is taught in French and English.

May 2019: Publications in the Data Science Journal

May 2019:  Publications in the Data Science Journal

Title: Interdisciplinary Comparison of Scientific Impact of Publications Using the Citation-Ratio
Author: Arthur R. Bos, Sandrine Nitza
URL: http://doi.org/10.5334/dsj-2019-019
Title: Diversity of Woody Species in Djamde Wildlife Reserve, Northern Togo, West Africa
Author:Tchagou Awitazi, Raoufou Radji, Kotchikpa Okoumassou
URL: http://doi.org/10.5334/dsj-2019-018
Title: A Generic Research Data Infrastructure for Long Tail Research Data Management
Author: Atif Latif, Fidan Limani, Klaus Tochtermann
URL: http://doi.org/10.5334/dsj-2019-017
Title: Time Series Prediction Model of Grey Wolf Optimized Echo State Network
Author: Huiqing Wang, Yingying Bai, Chun Li, Zhirong Guo, Jianhui Zhang
URL: http://doi.org/10.5334/dsj-2019-016
Title: Fostering Data Sharing in Multidisciplinary Research Communities: A Case Study in the Geospatial Domain
Author: Martina Zilioli, Simone Lanucara, Alessandro Oggioni, Cristiano Fugazza, Paola Carrara
URL: http://doi.org/10.5334/dsj-2019-015

Urban Data Science School from May 13 – May 23, 2019

This article was first published by instructors Dr. Shaily R. Gandhi and Felix Emeka Anyiam https://shailygandhi.github.io/UrbanDataScience2019/ – Shaily and Felix are both alumni of the CODATA-RDA School of Research Data Science.

The second summer school on Urban Data Science was conducted following the successful completion of the first summer school on Urban Data Science in 2018 which is an outcome of the collaboration which took place at The CODATA-RDA Research Data Science Summer School in Trieste, Italy 2017. This year the course Urban Data Science was hosted by the Summer Winter School CEPT University, Ahmedabad, India from May 13 – May 23, 2019.

With the upcoming trend of data driven solutions for use at the central level for making city operations more efficient and effective; the next generation of city planners will need to be as comfortable using advanced simulation algorithms as it is with design. This course helped to address the challenges with poor use of available open data in decision making while keeping urban in focus. This summer school course had been modified to get students started with the basic data science components in a short span of 10 days. This year the course had an additional 4 days which helped in making more insightful results from the open data sets that the last years 6 days course. Open data sets allows for a deeper understanding of the urban dynamics and its associated challenges, allowing the students to have a firmer control over possible bias and therefore analysing and giving solutions for overcoming these observed challenges.

The course this year was carefully modified with the feedback of students from the previous summer school of 2018, keeping in mind that the 24 new intakes are from different backgrounds like planning, architects, civil engineer, geomatics and other disciplines from both bachelors and masters level who belonged to IT and non- IT backgrounds. The curriculum covered basics of Git and Git hub, where students got an extremely intense hand on practical experience in using the software and learning how to open up their projects on GitHub. More over Open Refine, R and excel was covered for data cleaning. The lessons of the basics of R were prepared by using the material of software carpentry lessons Programming with R, R for Reproducible Scientific Analysis and Geospatial Data workshop. The concepts were taken from various sources and the lessons were redesigned focusing on urban problems and analysis.

The school begun with students understanding the concepts behind setting-up their study objectives towards enhanced conceptualized Research titles and using techniques to develop a research theory, planning the area of their study, thus bearing in mind the type of data available from Open data sets to be captured, whether continuous, discrete, ordinal or nominal data and the different stages of statistical analysis that can be conducted in other to produce the expected outcomes. Knowledge on research methodologies and implementation of statistical application software’s to support data analysis was one of the vital goals of the course. The Statistical software package called “R” was used as it has become a very powerful and useful tool for the purpose of data cleaning, management, statistical analysis and data graphical visualization. When mastered, this application is user friendly and could reduce the time and efforts of the researcher, student and professionals. The word cloud below shows the number of technology students had explored during the summer school.

Urban Data Science Summer School 2019

Innovative teaching techniques like mixing theory and practical’s with real life examples were followed in this course as it had diverse students attending and it required a special attention to keep the whole class on the same pace. Despite the course being intense from morning 9:30am till evening 5:30pm, it was very motivating to see the students following up with the topics and catching up with the pace of the instructors. To better understand the various levels of the 24 students, we conducted a pre and post summer school survey which gave us an idea about how well the school has changed the perspective of the students for programming in R to being confident in using Git and OpenRefine. Daily feedback was taken from the students similar to the last years practice to enhance class activity decisions by tutors. Continuous constructive comments from the students made it more effective as the tutors were able to achieve the desired output by changing the teaching method according to the requirement of the students. This process of understanding the capability of the students was well appreciated and implemented.

Urban Data Science Summer School 2019 was well appreciated by the students and the outcomes of the course were very insightful with statistical evidence. The topics selected by the students and its frequency is shown in the below Word Cloud. Urban planning and decision making consists of insight—and this insights are collected and analysed using open data sets in other to know how things are in our environment today, which this course promoted deeply. The role of Urban Data Science is in enhancing Urban Planning and Policy-making with more data driven decisions which is in need at this time. The students of this summer school came up with wonderful insights and results. It was a great pleasure to receive outputs of the case study in various topics such as: Crime, the Economy, Education, Governance and Planning, Environment, Public Health, Road Accident, Sports, etc.

Urban Data Science Summer School 2019

Linda Reeba Koshy, a student of the summer school’s project was a case study on the Prevalence of Obesity among Socially Vulnerable Groups in the United States, with interesting results proving that Obesity is prevalent among Ethnic/Racial minorities, and that socioeconomic, racial factors influence obesity in children and the elderly. Also, persons from Low income households and lower educational levels were more likely to be obese due to their poor dietary choices. A second study on analyzing the performance of Indian states and union territories in terms of Sustainable Development Goals (SDG) for the year 2018 by Kavina Mehta recommended from the analysis that Law Enforcement and Policy Interventions should be the first steps towards enhancing Indian’s sustainable development targets along with political willingness. The study on the Understanding of the Pattern of Terrorist Attacks in India by Pooja Toshniwal, concluded that more number of attacks are happening in Jammu and Kashmir using various types of weapons. This analysis of attacks helps in understanding the pattern of attacks which could be used by defence to halt future attacks. Contribution of Education in Development of countries across the world by Surabhi Samant threw more light on some of the un-expecting factors about the literacy rate which is significantly affected by child marriage, child labour, and poverty. There is no significant impact on government expenditure which means it is not about spending money but also the implementation of the right mechanism. This would be contextual to every country and its economic status. The study also concluded that the literacy rate has a significant impact on Human Development and Happiness Index of a country, and moderate impact on Gross Domestic Product (GDP). In principle, education not only encourages economic growth but also assures quality life and overall development of a country. Many more interesting studies were carried out under this course. In conclusion, the inclusion of Urban Data Science in the SWS curriculum is priceless, as it brought an exponential improvement in the scholastic learning of the participants towards their data and spatial analytics enhancement via visualization and performance.

Disaster Risk Reduction and Open Data Newsletter: May 2019 Edition

WHO releases first guideline on digital health interventions

The WHO has released new recommendations on 10 ways that countries can use digital health technology, accessible via mobile phones tablets and computers to, improve people’s health and essential services.

United Nations Office for Disaster Risk Reduction – 2018 Annual Report

The 2018 annual report provides an overview of the results achieved by the UNISDR in relation to the three Strategic Objectives and two Enablers of its Work Programme 2016-2019.

UNISDR – Global Platform for Disaster Risk Reduction

Taking place in Geneva, Switzerland from 13 May – 17 May, the global platform is an opportunity for the DRR community to come together to renew and accelerate efforts to implement the Sendai Framework for Disaster Risk Reduction.

Read the full newsletter here

April 2019: Publications in the Data Science Journal

April 2019:  Publications in the Data Science Journal

Title: A Survey of Machine Learning Approaches and Techniques for Student Dropout Prediction
Author: Neema Mduma, Khamisi Kalegele, Dina Machuve
URL: http://doi.org/10.5334/dsj-2019-014
Title: GeoSimMR: A MapReduce Algorithm for Detecting Communities based on Distance and Interest in Social Networks
Author: Zaher Al Aghbari, Mohammed Bahutair, Ibrahim Kamel
URL: http://doi.org/10.5334/dsj-2019-013
Title: Building an International Consensus on Multi-Disciplinary Metadata Standards: A CODATA Case History in Nanotechnology
Author: John Rumble, John Broome, Simon Hodson
URL: http://doi.org/10.5334/dsj-2019-012

CODATA is pleased to announce Mark Parsons as the new Editor-in-Chief of the Data Science Journal

In his blog post, Mark writes: ‘I am especially interested in helping DSJ build its niche as an influential journal of the ‘science of data’ in the sense that CODATA described it decades ago. We need more fora that encourage dialog across research and practice to understand all the issues around the socio-technical work necessary for data to be findable, accessible, interoperable, reusable, ethical, secure, etc.’ …

‘I have been a member of the DSJ editorial board since the journal moved to Ubiquity Press, and I have been impressed at how Sarah Callaghan and other editors have worked to increase the journal’s quality. I want to continue this momentum. I want to further bolster the review quality and also raise the possibility of open reviews. The nature of DSJ is that it often attracts submissions and requires reviews from practitioners who have much less of a mandate to publish than researchers. I believe practitioners should be encouraged to contribute (with research as well as practice papers), so we should do what we can to recognize and model excellent contributions in this area. …

‘Thanks to Sarah’s great work, DSJ has a bright future as submissions continue to increase in number and quality. DSJ was ahead of its time when it was founded in the 1990s. I am eager to explore how it can continue to push important conversations forward. I welcome all your ideas. Please tell me what you think. Better yet, tell the community through a submission to DSJ!

Read more at https://codata.org/blog/2019/04/29/mark-parsons-joins-codata-as-editor-in-chief-data-science-journal/

Mark replaces Sarah Callaghan, who has served since 2015, when the Data Science Journal was moved to its current platform with Ubiquity Press.

Sarah writes:

‘In my four year tenure, I am very proud of the fact that 135 papers have been published, along with 6 Special Collections with another 5 Special Collections in the pipeline. The journal has grown more popular and is steadily publishing research that is more impactful as time goes on, and this is a testament to the hard work of all involved – including our reviewers and authors.

‘It is time for me to hand over the role of EiC to another, and it is with no small amount of sadness that I do so. Being EiC has been incredibly rewarding (and occasionally infuriating) and I have learned a great deal from it. I am very pleased to know that Mark Parsons is taking over the role, and know that the journal will be in safe, knowledgeable hands.

‘It only remains for me to say my farewells and thank yous. Thank you to the authors, without whom there would be no articles to publish. A thousand thank yous to all my editors, reviewers, colleagues and friends – your efforts on behalf of the journal are deeply, deeply appreciated, as is your wisdom and expertise. I wish you all the very best for the future, and look forward to reading more excellent papers published in the DSJ!’

Read more at https://codata.org/blog/2019/04/29/so-long-and-thanks-for-all-the-fish-a-farewell-from-outgoing-data-science-journal-editor-in-chief-sarah-callaghan/

Growing the Conversation on the Science of Data

Image CC-BY-NC Laura Molloy @LM_HATII from the art intervention series ‘Humans of Data’ https://codata.org/blog/category/humans-of-data/

Mark Parsons joins CODATA as Editor-in-Chief, Data Science Journal

I am honored and excited to take on the role of Editor in Chief for the Data Science Journal.

I have had a bit of history with DSJ. One of my earliest peer-reviewed papers was published with Ruth Duerr in DSJ (Parsons and Duerr 2005). I vividly remember hurrying to make revisions in Costa Rica before heading offline for several weeks. I’d still like to meet one of the reviewers (perhaps I have) who made really helpful comments on how to organize and present the paper to get my points across in a more rigorous and impactful way. I was a data practitioner, not a researcher, and was largely unschooled in formal scientific writing. The guidance was most valuable, and the paper still gets cited now and again.

Years later, I and Peter Fox published what was one one of my most controversial and influential papers (Parsons and Fox 2013). This time, DSJ allowed me to publish after an unconventional public review process involving reams of open review comments from more than two-dozen people.

In short, DSJ has been a catalyst for my career. So I am eager to help foster the journal’s growth and influence and maybe help a few more data scientists along their way.

I am especially interested in helping DSJ build its niche as an influential journal of the ‘science of data’ in the sense that CODATA described it decades ago. We need more fora that encourage dialog across research and practice to understand all the issues around the socio-technical work necessary for data to be findable, accessible, interoperable, reusable, ethical, secure, etc.

I have been a member of the DSJ editorial board since the journal moved to Ubiquity Press, and I have been impressed at how Sarah Callaghan and other editors have worked to increase the journal’s quality. I want to continue this momentum. I want to further bolster the review quality and also raise the possibility of open reviews. The nature of DSJ is that it often attracts submissions and requires reviews from practitioners who have much less of a mandate to publish than researchers. I believe practitioners should be encouraged to contribute (with research as well as practice papers), so we should do what we can to recognize and model excellent contributions in this area.

While improving the content of DSJ, we should also continue to modernize its presentation. We need to actively consider machine-readable papers and content negotiation for both the papers and the metadata. Much like at its founding, DSJ needs to advance the whole concept of scholarly communication.

Thanks to Sarah’s great work, DSJ has a bright future as submissions continue to increase in number and quality. DSJ was ahead of its time when it was founded in the 1990s. I am eager to explore how it can continue to push important conversations forward. I welcome all your ideas. Please tell me what you think. Better yet, tell the community through a submission to DSJ!

So Long, and Thanks for All the Fish’: a farewell from outgoing Data Science Journal Editor-in-Chief, Sarah Callaghan

Back in early 2015, I was approached at a coffee break at a conference, and invited to take on the role of Editor-in-Chief of the Data Science Journal. This was a little bit of a surprise, I will confess, as my previous academic journal experience had been as an associate editor, along with some projects working on data citation and data publishing. The opportunity was too good to resist, however, and with the support of my employer CEDA  I was very pleased to take on the role.

My tenure as EiC also coincided with the move of the journal to its current platform on Ubiquity Press, and came with it the need to appoint a new editorial board, develop a new scope and guidance, collate a new reviewer database, and the other minutiae of re-launching an academic journal. All these things were achieved with the help of my colleagues in the editorial board and section editors, along with the help and support of the Ubiquity Press staff and the CODATA Executive Committee.

In my four year tenure, I am very proud of the fact that 135 papers have been published, along with 6 Special Collections with another 5 Special Collections in the pipeline. The journal has grown more popular and is steadily publishing research that is more impactful as time goes on [https://www.scimagojr.com/journalsearch.php?q=4700152809&tip=sid], and this is a testament to the hard work of all involved – including our reviewers and authors.

It is time for me to hand over the role of EiC to another, and it is with no small amount of sadness that I do so. Being EiC has been incredibly rewarding (and occasionally infuriating) and I have learned a great deal from it. I am very pleased to know that Mark Parsons is taking over the role, and know that the journal will be in safe, knowledgeable hands.

It only remains for me to say my farewells and thank yous. Thank you to the authors, without whom there would be no articles to publish. A thousand thank yous to all my editors, reviewers, colleagues and friends – your efforts on behalf of the journal are deeply, deeply appreciated, as is your wisdom and expertise. I wish you all the very best for the future, and look forward to reading more excellent papers published in the DSJ!

Sarah

The advent of big data heralds huge opportunities

This article was first published by the Jomo Kenyatta University of Agriculture and Technology http://www.jkuat.ac.ke/prof-muliaro-the-advent-of-big-data-heralds-huge-opportunities/

Prof. Muliaro delivering his public lecture presentation

The advent and emergence of “Big Data” and its related technologies has brought with it immense opportunities which can be seized if a new era of openness that leverages on various technologies, institutional and organizational frameworks that are critical in harnessing data are developed.

This was revealed during a public lecture titled: Openness in Data, Science and Governance,  delivered by Muliaro Wafula, an Associate Professor in the Department of Computing, School of Computing and Information Technology and Director of the ICT Centre of Excellence and Open Data at Jomo Kenyatta University of Agriculture and Technology (JKUAT), Monday, April, 15, 2019.

Addressing the audience that included the President of CODATA, Prof. Barend Mons and the Executive Director, Dr. Simon Hudson, Prof. Muliaro gave an exposition on the concept of Open data, Open science, and Open governance.

Prof. Mons makes his brief remarks

Characterizing open science as a combination of concepts, tools, platforms and media to promote creation and dissemination of knowledge in free, open and more inclusive ways Prof Muliaro stated that “the goal of open science is to accelerate scientific progress and discoveries to benefit all, guaranteeing that scientific outputs are publicly available and easily accessible for others to use, re-use, and build upon.

He identified what he termed as key open science challenges namely; lack of established best open science practices, competition among scientists, existing credit systems that favour closed science, non-disclosure agreements and copyright laws and intellectual property guidelines as some of the drawbacks against full realization of open science.

Citing the partnership that brings together Jomo Kenyatta University of Agriculture and Technology, JICA AFRICA-ai-JAPAN Project, IBM East Africa, CODATA and the Kenya Open Data Initiative (KODI), Prof. Muliaro said, the parties were working closely to promote the value of open research data through organizing hackathons on selected datasets of interest to the public in disciplines such as public health and agriculture.

Leveraging on their synergies, the initiative seeks to build, among others, “innovative mobile and web applications that make access and consumption of research data easy for the benefit of the society; encourage scientists to open their research data for public consumption and use, showcase open data capability in providing innovative solutions to societal challenges,” Prof. Muliaro stated.

Prof. Abukutsa delivers the opening remarks

He mentioned Smart Health Application based on indigenous vegetables data, Children Food Nutrition Formula Application based on local Kenyan foods, and Effects of Mugukaa on Health, as some of the key outputs under the initiative.

The Vice Chancellor, Prof. Victoria Wambui Ngumi, in her opening message said, the journey towards embracing open data at JKUAT began five years ago when the institution established the ICT Centre of Excellence and Open Data (iCEOD) – which is expected to serve Kenya and Africa as a region, adding that the Centre had already taken its strategic role seriously, making contributions at the national and global level.

Prof. Ngumi further observed in the remarks read on her behalf by the Deputy Vice Chancellor in charge of Research, Production and Extension, Prof. Mary Abukutsa, that “JKUAT is among few leading universities that have taken a bold step towards creating an enabling environment for open data by formulating and adopting an Open Data Research (JORD) Policy in line with the CODATA – led Nairobi Open Data Principles of 2014.”

She however decried insufficient and poor public sensitization on issues such as open data, open science and open governance, arguing that “the tradition and culture for most people has been to be private by default.” Prof Ngumi called for a deliberate strategy to towards changing that mindset.

A section of the academic community including guests
who attended
the public lecture presentation.

The President of CODATA, Prof. Barend Mons, said Africa could lead the initiative to use data at the global level noting,  “data or knowledge is the new oil or gold and it could be more useful if it is shared,” while CODATA Executive Director, Dr. Simon Hudson, underscored the importance of data in implementing sustainable development goals by “creating and measuring data to make meaningful, mindful informed decisions.”

Present at the public lecture included; Deputy Vice Chancellor (Administration), Prof. Bernard Ikua; Principal, College of Pure and Applied Sciences, Prof. David Mulati; Deans of Schools including the Dean, School of Computing and Information Technology, Prof. Stephen Kimani,  Heads of Departments and Faculty and students.