Publishing an article in CODATA Data Science Journal

This article was first published by Ms. Neema Mduma https://neylicious.github.io/ml/2019/05/11/paper.html – Neema is an alumni of the CODATA-RDA School of Research Data Science.

In early 2017, I was privileged to work as a researcher in the Dropwall project (by Rose Funja) which was among the winning project of the Data for Local Impact Innovation Challenge (DLIIC). The main focus of the project was to develop a tool that will help fighting dropout among secondary school girls. The findings from this project show a high rate of dropout among secondary school students particularly girls, and coincide with reports from other studies which show that school dropout is a big challenge in developing countries. On addressing this problem, machine learning techniques has gained much attention in recent years. However, most of the work has been carried out in developed countries, there are only a handful of studies conducted in developing countries on school dropout using machine learning techniques with the consideration of local context and data imbalance problem. This motivated me to continue working (in my PhD) on school dropout using machine learning.

In August 2018, I attended a CODATA-RDA Research Data Science Summer School which was held at the Abdus Salam International Centre of Theoretical Physics (ICTP) in Trieste, Italy. The aim was on building competence in data analysis and security for participants from all disciplines and backgrounds from Sciences to Humanities. The level of engagements and interactions between participants and instructors was outstanding. We were introduced to various opportunities (by The Executive Director of CODATA, Dr. Simon Hodson) such as CODATA Data Science Journal where I later managed to publish the breathtaking findings from the Dropwall project titled A Survey of Machine Learning Approaches and Techniques for Student Dropout Prediction.

June 2019: Publications in the Data Science Journal

June 2019:  Publications in the Data Science Journal

Title: Developing a Model Guidelines Addressing Legal Impediments to Open Access to Publicly Funded Research Data in Malaysia
Author
: Haswira Nor Mohamad Hashim
URL: 
http://doi.org/10.5334/dsj-2019-027
Title: Proposed Guideline for Minimum Information Stroke Research and Clinical Data Reporting
Author:Judit Kumuthini, Lyndon Zass, Melek Chaouch, Michael Thompson, Paul Olowoyo, Mamana Mbiyavanga, Faniyan Moyinoluwalogo, Gordon Wells, Victornia Nembeware, Nicola J. Mulder, Mayowa Owolabi,
URL: http://doi.org/10.5334/dsj-2019-026
Title: A Column Styled Composable Schema Matcher for Semantic Data-Types
Author:  Xiaofeng Liao, Jordy Bottelier, Zhiming Zhao
URL: http://doi.org/10.5334/dsj-2019-025
Title: Importance and Incorporation of User Feedback in Earth Science Data Stewardship
Author
: Hampapuram Ramapriyan, Jeanne Behnke
URL: 
http://doi.org/10.5334/dsj-2019-024
Title: Establishing, Developing, and Sustaining a Community of Data Champions
Author
: James L. Savage, Lauren Cadwallader
URL: 
http://doi.org/10.5334/dsj-2019-023
Title: The Definition of Reuse
Author
: Stephanie van de Sandt, Sünje Dallmeier-Tiessen, Artemis Lavasa, Vivien Petras
URL: 
http://doi.org/10.5334/dsj-2019-022
Title: Geoscientists’ Perspectives on Cyberinfrastructure Needs: A Collection of User Scenarios
Author
: Karen I. Stocks, Sam Schramski, Arika Virapongse, Lisa Kempler
URL: 
http://doi.org/10.5334/dsj-2019-021
Title: Data Distribution Centre Support for the IPCC Sixth Assessment
Author
: Martina Stockhause, Martin Juckes, Robert Chen, Wilfran Moufouma Okia, Anna Pirani, Tim Waterfield, Xiaoshi Xing, Rorie Edmunds
URL: 
http://doi.org/10.5334/dsj-2019-020

Humans of Data 30

“The things that made me interested in data twenty-five years ago are the same things that make me interested in it now.  It’s the way structures and narratives are going to define our culture and who we are.  I was interested initially in the historical perspective: how is data going to change the subtlety about how we understand past cultures?  And how future generations are going to access data, manipulate it and study it.  Very few people were interested in that, and that was exciting.  Now I’m a bit scared of it.  It’s increasingly clear that we can use data in a contemporary context for social evils.

Data – and the systems that store it – can be ugly, but they can also be beautiful.  Some day, people are going to be interested in the beauty of system architectures and the beauty of database design – or the ugliness.  It’s like studying a Gothic cathedral or a contemporary city: the architecture defines how we feel and the narrative that plays out in our lives.

I see emerging possibilities in machine learning, in the blockchain and in other areas.  I share an understanding of the risks of data.  And I believe that what kind of research you actually do – and how you enable others to be creative to do things – these are equally important.  More focus needs to go on developing the community and helping our colleagues to progress and excel.  The ways forward in digital curation and data presentation are very much going to come from human collaboration and not from one lone person who’s thought of a technical solution that no-one has thought of before.  This is not the field of lone scholars – this is the field of general community effort.”

World Data System: Early Career Researcher Training Workshop 2019

https://www.flickr.com/photos/sbszine/9353420967/

Data Curation and Management: Current Achievements and Future Challenges

The management and curation of research data is a very timely topic. All researchers rely on data they have themselves collected or that are the outputs of previous studies. Moreover, researchers are increasingly required to organize the long-term storage and access of the data used to obtain their results. As such, data training is highly relevant to budding scientists as they embark on their careers!

With many thanks to the generous sponsorship of the European Geosciences Union (EGU), the World Data System of the International Science Council (WDS) is delighted to offer a Research Data Management (RDM) Training Workshop aimed at early career researchers and scientists (ECRs).

The Workshop will take place on 6–8 November 2019 at Institut de Physique du Globe in Paris, France. Twenty-four (24) seats are available, for whom dormitory accommodation (FIAP) and meals will be covered. There is also limited funding available towards some participants travel costs. If more people apply than there are places available, selection of participants by the WDS Scientific Committee(WDS-SC) will be based upon their interest in the Workshop.

Apply Now!

The Regional Master program in Biodiversity informatics opens registration to students

The master program in Biodiversity informatics has opened its doors at the Faculty of Agricultural Sciences of the University of Abomey-Calavi, Benin, in October 2017. It is at its third student batch from next academic year.

It is regional in that it receives students of several nationalities and national and international teachers (especially from the United States and Europe) come to teach in the program.

The objective of this Regional Master is to train specialists of a new generation in biodiversity, able to integrate threats to biodiversity to research issues to deduce relevant results to inform decision-making on biodiversity in the context of climate and global change.

Here is the link to the section of the communication and outreach on TV that clarified the master program (https://youtu.be/4ajSBF1Jdyk) (it is in French)

We invite you to join or massively enroll your staff in the program in order to meet the challenges of mobilizing and using biodiversity data for decision-making.

Applications are already in progress since the 1st June and we are closing on 30th June 2019.

The program is taught in French and English.

May 2019: Publications in the Data Science Journal

May 2019:  Publications in the Data Science Journal

Title: Interdisciplinary Comparison of Scientific Impact of Publications Using the Citation-Ratio
Author: Arthur R. Bos, Sandrine Nitza
URL: http://doi.org/10.5334/dsj-2019-019
Title: Diversity of Woody Species in Djamde Wildlife Reserve, Northern Togo, West Africa
Author:Tchagou Awitazi, Raoufou Radji, Kotchikpa Okoumassou
URL: http://doi.org/10.5334/dsj-2019-018
Title: A Generic Research Data Infrastructure for Long Tail Research Data Management
Author: Atif Latif, Fidan Limani, Klaus Tochtermann
URL: http://doi.org/10.5334/dsj-2019-017
Title: Time Series Prediction Model of Grey Wolf Optimized Echo State Network
Author: Huiqing Wang, Yingying Bai, Chun Li, Zhirong Guo, Jianhui Zhang
URL: http://doi.org/10.5334/dsj-2019-016
Title: Fostering Data Sharing in Multidisciplinary Research Communities: A Case Study in the Geospatial Domain
Author: Martina Zilioli, Simone Lanucara, Alessandro Oggioni, Cristiano Fugazza, Paola Carrara
URL: http://doi.org/10.5334/dsj-2019-015

Urban Data Science School from May 13 – May 23, 2019

This article was first published by instructors Dr. Shaily R. Gandhi and Felix Emeka Anyiam https://shailygandhi.github.io/UrbanDataScience2019/ – Shaily and Felix are both alumni of the CODATA-RDA School of Research Data Science.

The second summer school on Urban Data Science was conducted following the successful completion of the first summer school on Urban Data Science in 2018 which is an outcome of the collaboration which took place at The CODATA-RDA Research Data Science Summer School in Trieste, Italy 2017. This year the course Urban Data Science was hosted by the Summer Winter School CEPT University, Ahmedabad, India from May 13 – May 23, 2019.

With the upcoming trend of data driven solutions for use at the central level for making city operations more efficient and effective; the next generation of city planners will need to be as comfortable using advanced simulation algorithms as it is with design. This course helped to address the challenges with poor use of available open data in decision making while keeping urban in focus. This summer school course had been modified to get students started with the basic data science components in a short span of 10 days. This year the course had an additional 4 days which helped in making more insightful results from the open data sets that the last years 6 days course. Open data sets allows for a deeper understanding of the urban dynamics and its associated challenges, allowing the students to have a firmer control over possible bias and therefore analysing and giving solutions for overcoming these observed challenges.

The course this year was carefully modified with the feedback of students from the previous summer school of 2018, keeping in mind that the 24 new intakes are from different backgrounds like planning, architects, civil engineer, geomatics and other disciplines from both bachelors and masters level who belonged to IT and non- IT backgrounds. The curriculum covered basics of Git and Git hub, where students got an extremely intense hand on practical experience in using the software and learning how to open up their projects on GitHub. More over Open Refine, R and excel was covered for data cleaning. The lessons of the basics of R were prepared by using the material of software carpentry lessons Programming with R, R for Reproducible Scientific Analysis and Geospatial Data workshop. The concepts were taken from various sources and the lessons were redesigned focusing on urban problems and analysis.

The school begun with students understanding the concepts behind setting-up their study objectives towards enhanced conceptualized Research titles and using techniques to develop a research theory, planning the area of their study, thus bearing in mind the type of data available from Open data sets to be captured, whether continuous, discrete, ordinal or nominal data and the different stages of statistical analysis that can be conducted in other to produce the expected outcomes. Knowledge on research methodologies and implementation of statistical application software’s to support data analysis was one of the vital goals of the course. The Statistical software package called “R” was used as it has become a very powerful and useful tool for the purpose of data cleaning, management, statistical analysis and data graphical visualization. When mastered, this application is user friendly and could reduce the time and efforts of the researcher, student and professionals. The word cloud below shows the number of technology students had explored during the summer school.

Urban Data Science Summer School 2019

Innovative teaching techniques like mixing theory and practical’s with real life examples were followed in this course as it had diverse students attending and it required a special attention to keep the whole class on the same pace. Despite the course being intense from morning 9:30am till evening 5:30pm, it was very motivating to see the students following up with the topics and catching up with the pace of the instructors. To better understand the various levels of the 24 students, we conducted a pre and post summer school survey which gave us an idea about how well the school has changed the perspective of the students for programming in R to being confident in using Git and OpenRefine. Daily feedback was taken from the students similar to the last years practice to enhance class activity decisions by tutors. Continuous constructive comments from the students made it more effective as the tutors were able to achieve the desired output by changing the teaching method according to the requirement of the students. This process of understanding the capability of the students was well appreciated and implemented.

Urban Data Science Summer School 2019 was well appreciated by the students and the outcomes of the course were very insightful with statistical evidence. The topics selected by the students and its frequency is shown in the below Word Cloud. Urban planning and decision making consists of insight—and this insights are collected and analysed using open data sets in other to know how things are in our environment today, which this course promoted deeply. The role of Urban Data Science is in enhancing Urban Planning and Policy-making with more data driven decisions which is in need at this time. The students of this summer school came up with wonderful insights and results. It was a great pleasure to receive outputs of the case study in various topics such as: Crime, the Economy, Education, Governance and Planning, Environment, Public Health, Road Accident, Sports, etc.

Urban Data Science Summer School 2019

Linda Reeba Koshy, a student of the summer school’s project was a case study on the Prevalence of Obesity among Socially Vulnerable Groups in the United States, with interesting results proving that Obesity is prevalent among Ethnic/Racial minorities, and that socioeconomic, racial factors influence obesity in children and the elderly. Also, persons from Low income households and lower educational levels were more likely to be obese due to their poor dietary choices. A second study on analyzing the performance of Indian states and union territories in terms of Sustainable Development Goals (SDG) for the year 2018 by Kavina Mehta recommended from the analysis that Law Enforcement and Policy Interventions should be the first steps towards enhancing Indian’s sustainable development targets along with political willingness. The study on the Understanding of the Pattern of Terrorist Attacks in India by Pooja Toshniwal, concluded that more number of attacks are happening in Jammu and Kashmir using various types of weapons. This analysis of attacks helps in understanding the pattern of attacks which could be used by defence to halt future attacks. Contribution of Education in Development of countries across the world by Surabhi Samant threw more light on some of the un-expecting factors about the literacy rate which is significantly affected by child marriage, child labour, and poverty. There is no significant impact on government expenditure which means it is not about spending money but also the implementation of the right mechanism. This would be contextual to every country and its economic status. The study also concluded that the literacy rate has a significant impact on Human Development and Happiness Index of a country, and moderate impact on Gross Domestic Product (GDP). In principle, education not only encourages economic growth but also assures quality life and overall development of a country. Many more interesting studies were carried out under this course. In conclusion, the inclusion of Urban Data Science in the SWS curriculum is priceless, as it brought an exponential improvement in the scholastic learning of the participants towards their data and spatial analytics enhancement via visualization and performance.

Disaster Risk Reduction and Open Data Newsletter: May 2019 Edition

WHO releases first guideline on digital health interventions

The WHO has released new recommendations on 10 ways that countries can use digital health technology, accessible via mobile phones tablets and computers to, improve people’s health and essential services.

United Nations Office for Disaster Risk Reduction – 2018 Annual Report

The 2018 annual report provides an overview of the results achieved by the UNISDR in relation to the three Strategic Objectives and two Enablers of its Work Programme 2016-2019.

UNISDR – Global Platform for Disaster Risk Reduction

Taking place in Geneva, Switzerland from 13 May – 17 May, the global platform is an opportunity for the DRR community to come together to renew and accelerate efforts to implement the Sendai Framework for Disaster Risk Reduction.

Read the full newsletter here

April 2019: Publications in the Data Science Journal

April 2019:  Publications in the Data Science Journal

Title: A Survey of Machine Learning Approaches and Techniques for Student Dropout Prediction
Author: Neema Mduma, Khamisi Kalegele, Dina Machuve
URL: http://doi.org/10.5334/dsj-2019-014
Title: GeoSimMR: A MapReduce Algorithm for Detecting Communities based on Distance and Interest in Social Networks
Author: Zaher Al Aghbari, Mohammed Bahutair, Ibrahim Kamel
URL: http://doi.org/10.5334/dsj-2019-013
Title: Building an International Consensus on Multi-Disciplinary Metadata Standards: A CODATA Case History in Nanotechnology
Author: John Rumble, John Broome, Simon Hodson
URL: http://doi.org/10.5334/dsj-2019-012

CODATA is pleased to announce Mark Parsons as the new Editor-in-Chief of the Data Science Journal

In his blog post, Mark writes: ‘I am especially interested in helping DSJ build its niche as an influential journal of the ‘science of data’ in the sense that CODATA described it decades ago. We need more fora that encourage dialog across research and practice to understand all the issues around the socio-technical work necessary for data to be findable, accessible, interoperable, reusable, ethical, secure, etc.’ …

‘I have been a member of the DSJ editorial board since the journal moved to Ubiquity Press, and I have been impressed at how Sarah Callaghan and other editors have worked to increase the journal’s quality. I want to continue this momentum. I want to further bolster the review quality and also raise the possibility of open reviews. The nature of DSJ is that it often attracts submissions and requires reviews from practitioners who have much less of a mandate to publish than researchers. I believe practitioners should be encouraged to contribute (with research as well as practice papers), so we should do what we can to recognize and model excellent contributions in this area. …

‘Thanks to Sarah’s great work, DSJ has a bright future as submissions continue to increase in number and quality. DSJ was ahead of its time when it was founded in the 1990s. I am eager to explore how it can continue to push important conversations forward. I welcome all your ideas. Please tell me what you think. Better yet, tell the community through a submission to DSJ!

Read more at http://codata.org/blog/2019/04/29/mark-parsons-joins-codata-as-editor-in-chief-data-science-journal/

Mark replaces Sarah Callaghan, who has served since 2015, when the Data Science Journal was moved to its current platform with Ubiquity Press.

Sarah writes:

‘In my four year tenure, I am very proud of the fact that 135 papers have been published, along with 6 Special Collections with another 5 Special Collections in the pipeline. The journal has grown more popular and is steadily publishing research that is more impactful as time goes on, and this is a testament to the hard work of all involved – including our reviewers and authors.

‘It is time for me to hand over the role of EiC to another, and it is with no small amount of sadness that I do so. Being EiC has been incredibly rewarding (and occasionally infuriating) and I have learned a great deal from it. I am very pleased to know that Mark Parsons is taking over the role, and know that the journal will be in safe, knowledgeable hands.

‘It only remains for me to say my farewells and thank yous. Thank you to the authors, without whom there would be no articles to publish. A thousand thank yous to all my editors, reviewers, colleagues and friends – your efforts on behalf of the journal are deeply, deeply appreciated, as is your wisdom and expertise. I wish you all the very best for the future, and look forward to reading more excellent papers published in the DSJ!’

Read more at http://codata.org/blog/2019/04/29/so-long-and-thanks-for-all-the-fish-a-farewell-from-outgoing-data-science-journal-editor-in-chief-sarah-callaghan/

Growing the Conversation on the Science of Data

Image CC-BY-NC Laura Molloy @LM_HATII from the art intervention series ‘Humans of Data’ http://codata.org/blog/category/humans-of-data/

Mark Parsons joins CODATA as Editor-in-Chief, Data Science Journal

I am honored and excited to take on the role of Editor in Chief for the Data Science Journal.

I have had a bit of history with DSJ. One of my earliest peer-reviewed papers was published with Ruth Duerr in DSJ (Parsons and Duerr 2005). I vividly remember hurrying to make revisions in Costa Rica before heading offline for several weeks. I’d still like to meet one of the reviewers (perhaps I have) who made really helpful comments on how to organize and present the paper to get my points across in a more rigorous and impactful way. I was a data practitioner, not a researcher, and was largely unschooled in formal scientific writing. The guidance was most valuable, and the paper still gets cited now and again.

Years later, I and Peter Fox published what was one one of my most controversial and influential papers (Parsons and Fox 2013). This time, DSJ allowed me to publish after an unconventional public review process involving reams of open review comments from more than two-dozen people.

In short, DSJ has been a catalyst for my career. So I am eager to help foster the journal’s growth and influence and maybe help a few more data scientists along their way.

I am especially interested in helping DSJ build its niche as an influential journal of the ‘science of data’ in the sense that CODATA described it decades ago. We need more fora that encourage dialog across research and practice to understand all the issues around the socio-technical work necessary for data to be findable, accessible, interoperable, reusable, ethical, secure, etc.

I have been a member of the DSJ editorial board since the journal moved to Ubiquity Press, and I have been impressed at how Sarah Callaghan and other editors have worked to increase the journal’s quality. I want to continue this momentum. I want to further bolster the review quality and also raise the possibility of open reviews. The nature of DSJ is that it often attracts submissions and requires reviews from practitioners who have much less of a mandate to publish than researchers. I believe practitioners should be encouraged to contribute (with research as well as practice papers), so we should do what we can to recognize and model excellent contributions in this area.

While improving the content of DSJ, we should also continue to modernize its presentation. We need to actively consider machine-readable papers and content negotiation for both the papers and the metadata. Much like at its founding, DSJ needs to advance the whole concept of scholarly communication.

Thanks to Sarah’s great work, DSJ has a bright future as submissions continue to increase in number and quality. DSJ was ahead of its time when it was founded in the 1990s. I am eager to explore how it can continue to push important conversations forward. I welcome all your ideas. Please tell me what you think. Better yet, tell the community through a submission to DSJ!