Monthly Archives: November 2016

Hackathon on open research data: encouraging a balanced demand and supply of open research data

This post comes from Professor Joseph Muliaro Wafula, Director of ICT Centre of Excellence and Open Data (iCEOD) of the Jomo Kenyatta University of Agriculture and Technology, chair of CODATA Kenya and a member of the CODATA International Executive Committee. iCEOD is collaborating with IBM Cloud on an open data cloud platform:

“We wanted to maximize the value of our datasets—both to save money and, crucially, to encourage innovation and collaboration in the wider community,” says Professor Wafula.

“If we could take our data outside the confines of academia and allow developers and social entrepreneurs to harness it, we hope that they would start building applications to use this information for the public good.”

hack5On Saturday 12 November, the ICT Centre of Excellence and Open Data (iCEOD) of the Jomo Kenyatta University of Agriculture and Technology organized a hackathon on Kenyan open research data sets. The hackathon was intended to enable public access and use of research data through creation of mobile and web applications. It targeted research datasets from scientists in agriculture and public health. The hackathon was opened by the Dean School of Computing and Information Technology, Dr. Stephen Kimani who commended students for their readiness to participate in finding solutions to some of the existing challenges.

  1. Prof Mary Abukutsa ( provided research data set on nutritional value, germination and yields of indigenous African vegetables.
  2. Prof John Wesonga ( provided research data set on best conditions of growing French beans with minimum application of pesticides.
  3. Prof. Simon Karanja ( provided research data on the effects of Khat on users in Kenya.
  4. Dr. Frida Wanzala ( provided research data on papaya growing areas in Kenya.
  5. Dr. Peter Kahenya( provided research data on various conditions that affect cooking time of different types of beans.
  6. Dr. John Kinyuru ( provided research data on nutritional formulas of various local Kenyan foods.
  7. Mr. Francis Ombwara ( provided research data on identification of sweetness of papaya by color. All the developers were invited to explain the objective of their research.

hack2The iCEOD team curated the data sets and organized it in formats that were easily consumable by software developers. The team also created Application Programming Interfaces (API’s) and SQL queries which were uploaded to a database in a local computer environment, allowing developers to delve into development of applications without having to worry about the raw datasets.

hack4The objectives of the Hackathon

  1. To build innovative mobile and web applications that made access and consumption of research data easy for the benefit of the society.
  2. To encourage scientists to open their research data for public consumption and use.
  3. To showcase open data capability in providing innovative solutions to societal challenges
  4. To engage partners support on the open research data initiative.

Hackathon Partners.

  1. Jomo Kenyatta University of Agriculture and Technology (
  2. IBM East Africa ( )
  3. CODATA ( )
  4. AFRICA ai JAPAN Project (
  5. Kenya Open Data Initiative ( )

List of Judges

  1. Dr. Agnes Mindila-JKUAT
  2. Mr. Philip Oyier-JKUAT
  3. Mr. Silas Macharia –IBM Kenya
  4. Mr. Phebian –ICT Authority

The three winning applications

hack1Position 1: Team ACELORDS comprised of Alex Maina, Emmah Kimari and Peter Kamore – Used Nutritional dataset

  • This group developed a web and mobile application that would help know the nutritional value of the different common types of foods in Kenya.
  • A healthy diet ensures boosted immunity against most diseases, thus such a system becomes beneficial.
  • The user has the option of selecting any combination of foods offered and have the application calculate the resultant nutritional values of the selected food combination.
  • As an addition to the system functions, it has the ability to save choices made by a registered or logged in user. With this information, critical data such as weekly trends or monthly trends can be generated, thereby allowing further processing and interpretation of results.
  • A user can be alerted when the system senses potential health risks, such as higher or lower levels beyond thresholds of certain nutrients. With such information, the user can change his/her diet to a trend that would be beneficial to one’s health.
  • This application can be used by mothers to monitor their baby’s health trends and focus towards nurturing healthy children which is one of the major concerns in Africa.

Position 2: Nahayo B. Patrick and Pius Dan Nyongesa – Used Khat dataset

  • This group created an informational application.
    • This dataset was large with the different effects of Khat in Kenya such as the effects of Khat on local citizens, population that is taking khat, financial expenditure on the users of khat, the age of those involved in taking the khat, where khat grows among other issues.
    • This group created a real-time data analysis tool that takes this data and plots charts and graphs making easy to visualize and gain insights on the data.
    • Potential users of such applications are health service providers, counselors, security personnel, among others.

Position 3: Team ELITE (used Indigenous African Vegetables data)

  • This dataset shows the nutritional value of the different Kenyan indigenous vegetables and also the areas where this vegetables are commonly found. This dataset was used to develop an application that can help any user to determine which vegetables best suit their diets depending on the nutritional content they need most.
  • The application shows where you can find different vegetables and also which vegetables they require most depending on their need of nutrition.
  • Potential end users include Manufactures, Processors, Nutritionist, Health Providers, Nursing Mothers

All data sets are being prepared for registration so as to be assigned DOI. There APIs will be published on the JKUAT open Data Platform at when ready.


Jomo Kenyattta University of Agriculture and Technology, Prof Muliaro Wafula –Director iCEOD, Dr Simon Hudson-CODATA Executive Director, Prof Manabu Tsunoda- AFRICA ai JAPAN Project Chief Adviser, Silas Macharia-IBM Kenya,  Dr Agnes Mindila –Lecturer Computing Department & iCEOD Collaborator, Noriaki Tanaka — AFRICA ai JAPAN Project Coordinator, Pascal Ouma –Deputy Director ICT,  Francis Musyoki – iCEOD attached & MSc Student Software Engineering at JKUAT, Tom Nyongesa- iCEOD attached & Computer Science Final Year students JKUAT, Gyle Odhiambo- iCEOD attached & Computer Science Final Year students JKUA, and Alice Ebela –iCEOD Administrative Assistant .

List of Participants

1 Mwangi Shadrack 2 Nixon Thuo 3 Dominic Kithinji
4 Wycliff Obuya 5 Sydney Mainga 6 Enock Chesire
7 Alex Maina 8 Winnie Karanja 9 Collins Njoroge
10 Charles Wachira 11 John Makau 12 Winnie Karanja
13 Collins Njoroge 14 Charles Wachira 15 Kyalo Ian
16 Okumu Ian 17 Joseph Njenga 18 Irene Kimani
19 Simon Mulwa 20 Ancentus Makau 21 Emmah Kimari
22 Peter Kamoro 23 Polycap Okeyo 24 Khwolo Kabara
25 Chris Mureithi 26 Stephen Mwangi 27 Jimmy Koskei
28 Bashir Shelkh 29 Gaylord Odhiambo 30 Charles Waitiki
31 Mary Waweru 32 Rose Kinuthia 33 Vicky Gatobu
34 Jeremiah Kuria 35 Mercy Maina 36 Njoroge Daniel
37 Muthiani Jayson 38 Sammy Mugambi 39 Peter Kariuki
40 Alan Mwathe 41 Gitau Isaac 42 Daniel Biwott
43 Ngumo Nthenge 43 Gitau Moses 44 Njoroge Edward
45 Ambrose Mbae 46 Stephen Oduor 47 Elvis Shida
48 Mwembe Emmanuel 49 Waithaka Kennedy 50 Tabitha Akinyi
51 Brian Kimathi 52 Lusenaka Alvin 53 Kamau Karogo
54 Nechewnje Ian 55 Daniel Ondigo 56 Nyongesa K Tom
57 Kevin Ruo 58 Christopher Mulwa 59 Ngacha Duncan
60 Mark Ngatia 61 Sarah Waiganjo 62 Eugene Ogongo
63 Jacob Kenneth 65 Pius Nyongesa

Humans of Data 6

img_3939-edit“I’m passionate about the transfer we’re seeing in research: moving from a cottage industry to a place where knowledge is increasingly coming through trusted processes.  Research data will be an output that can be used by lots of people.  The problem we have in research is that lots of people can’t use the data.  If we can create a trusted environment we can make a big difference to the way data is used.

Look, I’m old. I’m 62, and yet I’m passionate and I don’t want to give this up whilst this change is happening. I want to help get this set up for the next phase.  We can make it so that research data is much more available.  Our mission is to make research data more available for researchers, research institutions, the nation, and the global community.  This means that every day you jump from astronomy to history to social sciences, and you have to think about why it’s valuable to different sorts of people.  If you think about this problem in the right way, yes, you have to have technical support for this, but the heart of this is that you have to have trust.  That’s how you get things to happen.  So I measure data in trust rather than petabytes.  I measure data in people rather than petabytes.”

HarassMap at SciDataCon on their Data Management Project with IDRC

This post comes from Reem Wael, Director of HarassMap Reem was assisted in part by CODATA and GEO to attend SciDataCon and contributed a paper to a session on ‘Data Sharing in a Development Context: The experience of the IDRC Data Sharing Pilot’

hm_logo_05HarassMap launched five years ago with the mission to end social acceptability of sexual harassment in Egypt. This mission, unexpectedly, led to the accumulation of a lot of data coming from both online and offline sources, and the more we grow the more data we have. Our methodology is to combine online and offline work to achieve our mission and therefore we crowdsource reports on sexual harassment and through our social media outlets, and we receive information from outreach activities and trainings. We analyze this information and give it back to the community in the form of research reports, public campaigns, trainings and policies.

A few years ago, we started receiving requests from researchers who are working on topics that cross-cut with sexual harassment, to access our data. We responded to these requests by providing an excel sheet with the downloaded crowdsourced reports, but this was the limit of our assistance. When an opportunity came along to design and implement a data management plan, supported by IDRC, it was very relevant to our needs. IDRC is an international research organization and was interested in exploring how grantees can have their data more open to the public.

The main point that IDRC focused on is openly sharing the data. However, when we started to work on the project, we realized that the earlier stages are more challenging; which data do we store and how? We have a massive amount of data accumulating in the last five years. Other than the crowdsourced reports and the reports that we receive on social media, we also own a huge library of photographs and video footage, reports from trainings, evaluations from trainings that reflect the impact that we had, reports from outreach activities, and social media posts and replies. We reached some decisions in the planning phase and we are continuing to make these decisions as we move on.

We formed a ‘data management team’ from HarassMap staff who works on research and data and we tried to identify the data that we want to collect, organize and share, raising the following questions: why are we sharing data, and with whom? How can we organize it in a way that would be helpful to researchers, or others who request access to the data? Are there any ethical issues that we need to consider while sharing the data? These questions brought up some challenges. We were not sure what kind of data would be interesting to researchers, for instance. We found that even though crowdsourced reports are more coveted by researchers the more interesting data is the discussions on social media (our posts, including all the comments that we get) in addition to field reports. This data mirrors and tracks the development of myths and misconceptions of sexual harassment, especially when analyzed over a long span of time as it can show if a difference in attitudes and opinions on sexual harassment had occurred.

Embarking upon data management showed some challenges as well. One is a linguistic/technical challenge especially with the crowdsourced reports as we receive them in both English and Arabic. Privacy was a challenge regarding HarassMap’s library of photos and videos since it shows a lot of volunteers since 2010 from whom we did not take consent to share their photos publicly. We did not find an ethical problem with publishing and sharing crowdsourced reports because they are all anonymous, and we also filter them to remove any information that can hold us legally liable such as accusations against people or places by name.

That said, we are now in the process of accumulating and organizing data from the last five years, and putting it on our web server. The next phase – sharing – has its share of challenges. The first and most important is that we must have some kind of screening over who uses our data for various reasons; sometimes researchers completely misuse the data which puts HarassMap in a bad position. For instance claiming that crowdsourced reports reflect ‘hotspots’ of sexual harassment is essentially flawed yet a widely used claim. We always assert that crowdsourced reports provide biased data because there is a huge difference to access to internet and technology based on the affluence of the area and therefore receiving reports from a specific area doesn’t necessarily mean that harassment is more prevalent there, it may mean that people have better access and more knowledge about reporting. At other times, researchers have taken the data without giving credit to HarassMap; and some researchers have asked for the data and then disappeared without informing us of what they wrote.

Being part of this project has benefited HarassMap greatly not only because we started thinking about the idea of sharing our data on searchable engine, but also because we did not know the amount of data that we possess in the first place until we started looking for it. While putting our data completely public is something that HarassMap is still hesitant to do, we are definitely happy to provide researchers and other interested parties with data in a format that is more user