International Training workshop on Big Data for Science and Sustainability

Opening ceremony of training workshop

Opening ceremony of training workshop

The CODATA PASTD – IGU joint action of the International Training workshop on Big Data for Science and Sustainability in Developing Countries was successfully held from 17th -19th March, 2017 in Hyderabad, India. Training workshop is academic event of The Xth IGU International Conference on “Urbanization, Health & Well Being and Sustainable Development Goals”. Supported by the International Geographical Union (IGU) and Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences (IGSNRR, CAS), the Hyderabad training workshop is one of CODATA PASTD’s three capacity building activities in 2017. Other training activities will be held in Madagascar in September and in China in November.

The training course introduces young scientists to the ideas of open data, data sharing and data publication.  The training also covers Big Data, data analysis and applications in order to develop skills as ‘data scientists’.  The three day training workshop included lectures and hands-on practice, which aims to develop the skills and capacity necessary for preservation of and open access to research data in developing countries.

Prof. R..B.Singh, Vice Chairman of the International Federation (IGU), delivered an opening speech

Prof. R..B.Singh, Vice Chairman of the International Federation (IGU), delivered an opening speech

Prof. R.B.Singh, Vice President of the International Geographical Union (IGU) and Co-Chair of Strategy and Policy Sub-group of CODATA PASTD, and Yukio Himiyama, President of the International Geographical Union, attended the opening and closing ceremonies respectively. 56 students from 13 universities in India attended the training courses. CODATA PASTD member, Dr. Yunqiang Zhu, Co-chair of Capacity Building Sub-group of CODATA PASTD and professor from IGSNRR, CAS, and Dr. B. Srinagesh from Osmania University organized the training as co-chairs. Chinese scientists worked along with Indian colleagues to give courses on open Big

Yukio Himiyama, President of the International Federation (IGU), awarded a certificate to the trainees

Yukio Himiyama, President of the International Federation (IGU), awarded a certificate to the trainees

Data discovery, data publication and sharing, the Indian Earth observation system, geospatial data interoperability, geospatial data infrastructure and data sharing principles.

Participation in the training workshop was active and enthusiastic and students reported the results were beneficial and favourable. Professor R. R. Shingh, and Dr. V. Raghavaswamy, Deputy Director of the National Remote Sensing Centre, India, expressed their hope that the PASTD training course will continue in future and cultivate new generation of young data scientists with growing awareness of developments in data science and the benefits of international cooperation.

The closing ceremony of the training workshop

The closing ceremony of the training workshop

Humans of Data 13

IMG_4383“Science is about discovering that things aren’t as you expected.  The more I learn, the more I realise I don’t know.  One of the fun things about what I do just now is that I get to see a lot of different research communities and how they conceive of and represent data, and what data mean to them.

There are really a lot of different discipline-oriented communities. I come from a domain repository – we just called it a data center – and for me, it’s interesting coming from that environment as opposed to the library, institution, repository, or iSchool environments, who are dealing with very similar issues and approaching them with different perspectives.

I do think in some areas there is emerging consensus and that’s exciting to see. The very fact that everyone accepts PIDs on data, that’s almost universal, we might argue about which one, but the strong consensus is that there should be something.  We’re seeing greater convergence about metadata standards, too, particularly in my field.  I think we’re getting better at listening to each other from different domains – historians and ecologists discover they have the same data problems.  This makes them feel they’re not alone but also that their problems are generic and can have common solutions.  There is a community.  When I first started at a data center 25 years ago, I’d be the data person at science conferences. That’s not the way it works any more.

We are in dire times just now.  We seem to be in an age of growing authoritarianism, and some people are trying to pretend there isn’t evidential knowledge.  This makes research all the more important. Data sharing, open knowledge, open data, it’s more important than it’s ever been.

Humans of Data 12

IMG_3744-KA2

“My first job after university, I was doing computer stuff in a medical research place. I got a reputation as someone who was good at rescuing things off of old tapes and punchcards.  It had been expensive to collect that data, and people had sometimes suffered in providing it. But it was also a detective job and it was important.  But it was disappointing (though great for me professionally) when years later, I could come back into the field, and the sense of what was wrong then was still there. We still lose data because it’s on some piece of media that someone neglects or we’ve lost the documentation. Or we lose it because nobody knows where it is. If we don’t know it exists someone goes and repeats the work.

Now being able to work with this community of other people is great, making sure that stuff that could be of value in the future gets kept – it matters in lots of ways.  It matters because it saves us money, and that is important because it’s our taxes. And it matters because collectively as a society we’ll learn stuff from it: data can help prevent disasters, it can help improve crops, and many other important things in society.

This community is important to what I do every day. The only negative thing is that it gives you the sense of too many possibilities.  And you think, ‘Yeah, I can help you do this thing’.  And you don’t have time to do it all, which can be a crushing disappointment.  But it’s so nice to learn a bunch of things, and it’s an embarrassment of riches – things you can go and do, people you can collaborate with.  My job is often telling one group of people, ‘Hey, you should know about this other group’.  If that helps someone to reach out and collaborate, I feel like I’ve done something positive.”

Humans of Data 11

IMG_4001-MT-copy2

“I really love this group of people who work on data management and sharing.  I’m excited to be part of this very welcoming community.  I never experienced this elsewhere – it’s very nice to collaborate, to network. People are really happy to do work voluntarily. They are people who want to do not just their day to day job, but to change the world!”

Humans of Data 10

IMG_MT-crop

“Helping researchers to manage and share their data is what really motivates me. I was a researcher before, and much of research is not shared because the only incentive is to publish in ‘high impact factor’ journals. Nobody cares about what you’ve found out as an early career researcher, unless it’s published in a ‘high impact factor’ journal. I want to share more of the science of discovery. I love contributing to this change.

Data sharing is such an important part of opening up science. What’s really rewarding is when you explore with researchers how they can open up their research. People get a sparkle in their eyes. For me to get one convert really matters.  That’s what I’m most happy about.

It’s really important to understand the people you’re speaking with, to have this connection. There is never enough talking and advocacy, having a personal connection and understanding their motivation. That can’t be solved by any technical solution. It’s social change, cultural change. I strongly believe as an ex-scientist that it’s so important to change the reward system for research. It’s got to be transparent and get beyond only valuing what’s in the ‘high impact factor’ journal.”

Humans of Data 9

IMG_3795-edit copy

“We need more south–south collaborations.  I’d like to approach this and get in touch with people I’ve met here, and I’m trying to identify other people in Latin America that have the same interests.  Our data problems might be different from England or Canada or elsewhere in the north.  We have a lot of data that might be at risk of disappearing in the next few years, and this might be a bigger problem in developing countries.

I’m also concerned about how the southern hemisphere is going to contribute.  How do I get the funds that I need to get the work done that I need to do?  Trying to be part of this community is going to be a challenge for financial reasons.  I would surely not be here except for GEO and CODATA support; this was very special for me to receive that funding.  Otherwise I would miss this incredible opportunity for networking and knowledge sharing.

I think that open science is the only way forward to answer the complex problems that have been presented by society.  These problems are not local and involve so many different knowledge domains.  We need to do science from a more collaborative perspective to be able to tackle these challenges.  Collaboration is what I’m really passionate about.  When I return to Brazil I’ll start to talk to people and see how we can go from here.”

Humans of Data 8

IMG_3794-edit-armscrossed-2

“I’m a molecular biologist, not a data scientist. My recent PhD, however, was in information sciences and my concern was with knowledge management inside research organisations, but now I understand this includes data management.

Technology is very important, but I would like to know more about the social and cultural barriers that people are faced with when managing data, and how organisations may overcome these. Of course, the data professional is not a single person that meets all the requirements. We need capacity building: it’s all brand new for us in Brazil, and there are many challenges. One of them is the participation of women in these discussions. One of the talks that was very special at International Data Week this year was Christine Borgman’s. She had this very broad perspective, a holistic view on open research data. I expect for the future we can see more and more women engaged and have an active voice.”

Humans of Data 7

img_3843-edit
“I entered into the data profession about three and a half years ago. I found the community to be very welcoming. The ideas of ethics and sustainability are starting to be brought forward more strongly now. Data aren’t just digits in the memory. They have real world effects in real world situations.

One of the things that drew me particularly to the idea of preserving data, is to build on the research investments that people have made. People spend their lives exploring questions. If the information and data those answers are based on, aren’t kept useable in an understandable way, then the answers themselves are also lost. The end result is so many wasted lives when you add it up. It’s the time invested in the exploring these questions, but even more in a broadly humanitarian way, these answers are pursued to improve the lot of humanity. If the data collected through research are lost, the answers themselves are lost, and so the people, the environmental effects are also lost. So I think that’s my most important concern.

Look, I like efficiency. I like effectiveness. Not taking care of things you’ve spent time making, not making sure they can be used effectively – that’s a waste of everyone’s time and effort. It just bugs me. Data is the starting point for any answers we achieve through research. Let’s not waste that effort. If there’s anything this community could respond more in, it’s the human-related areas – the marketing and advertising of the importance of data and the importance of making sure the data is there to go back to. There’s no reason to reinvent wheels, but improving them is vital.”

Hackathon on open research data: encouraging a balanced demand and supply of open research data

This post comes from Professor Joseph Muliaro Wafula, Director of ICT Centre of Excellence and Open Data (iCEOD) of the Jomo Kenyatta University of Agriculture and Technology, chair of CODATA Kenya and a member of the CODATA International Executive Committee. iCEOD is collaborating with IBM Cloud on an open data cloud platform:

“We wanted to maximize the value of our datasets—both to save money and, crucially, to encourage innovation and collaboration in the wider community,” says Professor Wafula.

“If we could take our data outside the confines of academia and allow developers and social entrepreneurs to harness it, we hope that they would start building applications to use this information for the public good.”

hack5On Saturday 12 November, the ICT Centre of Excellence and Open Data (iCEOD) of the Jomo Kenyatta University of Agriculture and Technology organized a hackathon on Kenyan open research data sets. The hackathon was intended to enable public access and use of research data through creation of mobile and web applications. It targeted research datasets from scientists in agriculture and public health. The hackathon was opened by the Dean School of Computing and Information Technology, Dr. Stephen Kimani who commended students for their readiness to participate in finding solutions to some of the existing challenges.

  1. Prof Mary Abukutsa (http://www.jkuat.ac.ke/departments/horticulture/academic/) provided research data set on nutritional value, germination and yields of indigenous African vegetables.
  2. Prof John Wesonga (http://www.jkuat.ac.ke/departments/horticulture/prof-john-wesonga-m/) provided research data set on best conditions of growing French beans with minimum application of pesticides.
  3. Prof. Simon Karanja (http://www.jkuat.ac.ke/schools/soph/prof-simon-karanja/) provided research data on the effects of Khat on users in Kenya.
  4. Dr. Frida Wanzala (http://www.jkuat.ac.ke/departments/horticulture/academic/) provided research data on papaya growing areas in Kenya.
  5. Dr. Peter Kahenya(http://www.jkuat.ac.ke/departments/foodscience/?page_id=49) provided research data on various conditions that affect cooking time of different types of beans.
  6. Dr. John Kinyuru (http://www.jkuat.ac.ke/departments/foodscience/?page_id=49) provided research data on nutritional formulas of various local Kenyan foods.
  7. Mr. Francis Ombwara (http://www.jkuat.ac.ke/departments/horticulture/non-academic/) provided research data on identification of sweetness of papaya by color. All the developers were invited to explain the objective of their research.

hack2The iCEOD team curated the data sets and organized it in formats that were easily consumable by software developers. The team also created Application Programming Interfaces (API’s) and SQL queries which were uploaded to a database in a local computer environment, allowing developers to delve into development of applications without having to worry about the raw datasets.

hack4The objectives of the Hackathon

  1. To build innovative mobile and web applications that made access and consumption of research data easy for the benefit of the society.
  2. To encourage scientists to open their research data for public consumption and use.
  3. To showcase open data capability in providing innovative solutions to societal challenges
  4. To engage partners support on the open research data initiative.

Hackathon Partners.

  1. Jomo Kenyatta University of Agriculture and Technology (http://www.jkuat.ac.ke/)
  2. IBM East Africa (http://www.ibm.com/connect/ibm/ke/en/branch/ibm.html )
  3. CODATA (http://www.codata.org/ )
  4. AFRICA ai JAPAN Project (http://jkuat.ac.ke/projects/africa-ai-japan/?page_id=469)
  5. Kenya Open Data Initiative (https://opendata.go.ke/ )

List of Judges

  1. Dr. Agnes Mindila-JKUAT
  2. Mr. Philip Oyier-JKUAT
  3. Mr. Silas Macharia –IBM Kenya
  4. Mr. Phebian –ICT Authority

The three winning applications

hack1Position 1: Team ACELORDS comprised of Alex Maina, Emmah Kimari and Peter Kamore – Used Nutritional dataset

  • This group developed a web and mobile application that would help know the nutritional value of the different common types of foods in Kenya.
  • A healthy diet ensures boosted immunity against most diseases, thus such a system becomes beneficial.
  • The user has the option of selecting any combination of foods offered and have the application calculate the resultant nutritional values of the selected food combination.
  • As an addition to the system functions, it has the ability to save choices made by a registered or logged in user. With this information, critical data such as weekly trends or monthly trends can be generated, thereby allowing further processing and interpretation of results.
  • A user can be alerted when the system senses potential health risks, such as higher or lower levels beyond thresholds of certain nutrients. With such information, the user can change his/her diet to a trend that would be beneficial to one’s health.
  • This application can be used by mothers to monitor their baby’s health trends and focus towards nurturing healthy children which is one of the major concerns in Africa.

Position 2: Nahayo B. Patrick and Pius Dan Nyongesa – Used Khat dataset

  • This group created an informational application.
    • This dataset was large with the different effects of Khat in Kenya such as the effects of Khat on local citizens, population that is taking khat, financial expenditure on the users of khat, the age of those involved in taking the khat, where khat grows among other issues.
    • This group created a real-time data analysis tool that takes this data and plots charts and graphs making easy to visualize and gain insights on the data.
    • Potential users of such applications are health service providers, counselors, security personnel, among others.

Position 3: Team ELITE (used Indigenous African Vegetables data)

  • This dataset shows the nutritional value of the different Kenyan indigenous vegetables and also the areas where this vegetables are commonly found. This dataset was used to develop an application that can help any user to determine which vegetables best suit their diets depending on the nutritional content they need most.
  • The application shows where you can find different vegetables and also which vegetables they require most depending on their need of nutrition.
  • Potential end users include Manufactures, Processors, Nutritionist, Health Providers, Nursing Mothers

All data sets are being prepared for registration so as to be assigned DOI. There APIs will be published on the JKUAT open Data Platform at https://opendata.jkuat.ac.ke/ when ready.

Acknowledgement

Jomo Kenyattta University of Agriculture and Technology, Prof Muliaro Wafula –Director iCEOD, Dr Simon Hudson-CODATA Executive Director, Prof Manabu Tsunoda- AFRICA ai JAPAN Project Chief Adviser, Silas Macharia-IBM Kenya,  Dr Agnes Mindila –Lecturer Computing Department & iCEOD Collaborator, Noriaki Tanaka — AFRICA ai JAPAN Project Coordinator, Pascal Ouma –Deputy Director ICT,  Francis Musyoki – iCEOD attached & MSc Student Software Engineering at JKUAT, Tom Nyongesa- iCEOD attached & Computer Science Final Year students JKUAT, Gyle Odhiambo- iCEOD attached & Computer Science Final Year students JKUA, and Alice Ebela –iCEOD Administrative Assistant .

List of Participants

1 Mwangi Shadrack 2 Nixon Thuo 3 Dominic Kithinji
4 Wycliff Obuya 5 Sydney Mainga 6 Enock Chesire
7 Alex Maina 8 Winnie Karanja 9 Collins Njoroge
10 Charles Wachira 11 John Makau 12 Winnie Karanja
13 Collins Njoroge 14 Charles Wachira 15 Kyalo Ian
16 Okumu Ian 17 Joseph Njenga 18 Irene Kimani
19 Simon Mulwa 20 Ancentus Makau 21 Emmah Kimari
22 Peter Kamoro 23 Polycap Okeyo 24 Khwolo Kabara
25 Chris Mureithi 26 Stephen Mwangi 27 Jimmy Koskei
28 Bashir Shelkh 29 Gaylord Odhiambo 30 Charles Waitiki
31 Mary Waweru 32 Rose Kinuthia 33 Vicky Gatobu
34 Jeremiah Kuria 35 Mercy Maina 36 Njoroge Daniel
37 Muthiani Jayson 38 Sammy Mugambi 39 Peter Kariuki
40 Alan Mwathe 41 Gitau Isaac 42 Daniel Biwott
43 Ngumo Nthenge 43 Gitau Moses 44 Njoroge Edward
45 Ambrose Mbae 46 Stephen Oduor 47 Elvis Shida
48 Mwembe Emmanuel 49 Waithaka Kennedy 50 Tabitha Akinyi
51 Brian Kimathi 52 Lusenaka Alvin 53 Kamau Karogo
54 Nechewnje Ian 55 Daniel Ondigo 56 Nyongesa K Tom
57 Kevin Ruo 58 Christopher Mulwa 59 Ngacha Duncan
60 Mark Ngatia 61 Sarah Waiganjo 62 Eugene Ogongo
63 Jacob Kenneth 65 Pius Nyongesa

Humans of Data 6

img_3939-edit“I’m passionate about the transfer we’re seeing in research: moving from a cottage industry to a place where knowledge is increasingly coming through trusted processes.  Research data will be an output that can be used by lots of people.  The problem we have in research is that lots of people can’t use the data.  If we can create a trusted environment we can make a big difference to the way data is used.

Look, I’m old. I’m 62, and yet I’m passionate and I don’t want to give this up whilst this change is happening. I want to help get this set up for the next phase.  We can make it so that research data is much more available.  Our mission is to make research data more available for researchers, research institutions, the nation, and the global community.  This means that every day you jump from astronomy to history to social sciences, and you have to think about why it’s valuable to different sorts of people.  If you think about this problem in the right way, yes, you have to have technical support for this, but the heart of this is that you have to have trust.  That’s how you get things to happen.  So I measure data in trust rather than petabytes.  I measure data in people rather than petabytes.”