Author Archives: codata_blog

Humans of Data 13

IMG_4383“Science is about discovering that things aren’t as you expected.  The more I learn, the more I realise I don’t know.  One of the fun things about what I do just now is that I get to see a lot of different research communities and how they conceive of and represent data, and what data mean to them.

There are really a lot of different discipline-oriented communities. I come from a domain repository – we just called it a data center – and for me, it’s interesting coming from that environment as opposed to the library, institution, repository, or iSchool environments, who are dealing with very similar issues and approaching them with different perspectives.

I do think in some areas there is emerging consensus and that’s exciting to see. The very fact that everyone accepts PIDs on data, that’s almost universal, we might argue about which one, but the strong consensus is that there should be something.  We’re seeing greater convergence about metadata standards, too, particularly in my field.  I think we’re getting better at listening to each other from different domains – historians and ecologists discover they have the same data problems.  This makes them feel they’re not alone but also that their problems are generic and can have common solutions.  There is a community.  When I first started at a data center 25 years ago, I’d be the data person at science conferences. That’s not the way it works any more.

We are in dire times just now.  We seem to be in an age of growing authoritarianism, and some people are trying to pretend there isn’t evidential knowledge.  This makes research all the more important. Data sharing, open knowledge, open data, it’s more important than it’s ever been.

Humans of Data 12

IMG_3744-KA2

“My first job after university, I was doing computer stuff in a medical research place. I got a reputation as someone who was good at rescuing things off of old tapes and punchcards.  It had been expensive to collect that data, and people had sometimes suffered in providing it. But it was also a detective job and it was important.  But it was disappointing (though great for me professionally) when years later, I could come back into the field, and the sense of what was wrong then was still there. We still lose data because it’s on some piece of media that someone neglects or we’ve lost the documentation. Or we lose it because nobody knows where it is. If we don’t know it exists someone goes and repeats the work.

Now being able to work with this community of other people is great, making sure that stuff that could be of value in the future gets kept – it matters in lots of ways.  It matters because it saves us money, and that is important because it’s our taxes. And it matters because collectively as a society we’ll learn stuff from it: data can help prevent disasters, it can help improve crops, and many other important things in society.

This community is important to what I do every day. The only negative thing is that it gives you the sense of too many possibilities.  And you think, ‘Yeah, I can help you do this thing’.  And you don’t have time to do it all, which can be a crushing disappointment.  But it’s so nice to learn a bunch of things, and it’s an embarrassment of riches – things you can go and do, people you can collaborate with.  My job is often telling one group of people, ‘Hey, you should know about this other group’.  If that helps someone to reach out and collaborate, I feel like I’ve done something positive.”

Humans of Data 11

IMG_4001-MT-copy2

“I really love this group of people who work on data management and sharing.  I’m excited to be part of this very welcoming community.  I never experienced this elsewhere – it’s very nice to collaborate, to network. People are really happy to do work voluntarily. They are people who want to do not just their day to day job, but to change the world!”

Humans of Data 10

IMG_MT-crop

“Helping researchers to manage and share their data is what really motivates me. I was a researcher before, and much of research is not shared because the only incentive is to publish in ‘high impact factor’ journals. Nobody cares about what you’ve found out as an early career researcher, unless it’s published in a ‘high impact factor’ journal. I want to share more of the science of discovery. I love contributing to this change.

Data sharing is such an important part of opening up science. What’s really rewarding is when you explore with researchers how they can open up their research. People get a sparkle in their eyes. For me to get one convert really matters.  That’s what I’m most happy about.

It’s really important to understand the people you’re speaking with, to have this connection. There is never enough talking and advocacy, having a personal connection and understanding their motivation. That can’t be solved by any technical solution. It’s social change, cultural change. I strongly believe as an ex-scientist that it’s so important to change the reward system for research. It’s got to be transparent and get beyond only valuing what’s in the ‘high impact factor’ journal.”

Humans of Data 9

IMG_3795-edit copy

“We need more south–south collaborations.  I’d like to approach this and get in touch with people I’ve met here, and I’m trying to identify other people in Latin America that have the same interests.  Our data problems might be different from England or Canada or elsewhere in the north.  We have a lot of data that might be at risk of disappearing in the next few years, and this might be a bigger problem in developing countries.

I’m also concerned about how the southern hemisphere is going to contribute.  How do I get the funds that I need to get the work done that I need to do?  Trying to be part of this community is going to be a challenge for financial reasons.  I would surely not be here except for GEO and CODATA support; this was very special for me to receive that funding.  Otherwise I would miss this incredible opportunity for networking and knowledge sharing.

I think that open science is the only way forward to answer the complex problems that have been presented by society.  These problems are not local and involve so many different knowledge domains.  We need to do science from a more collaborative perspective to be able to tackle these challenges.  Collaboration is what I’m really passionate about.  When I return to Brazil I’ll start to talk to people and see how we can go from here.”

Humans of Data 8

IMG_3794-edit-armscrossed-2

“I’m a molecular biologist, not a data scientist. My recent PhD, however, was in information sciences and my concern was with knowledge management inside research organisations, but now I understand this includes data management.

Technology is very important, but I would like to know more about the social and cultural barriers that people are faced with when managing data, and how organisations may overcome these. Of course, the data professional is not a single person that meets all the requirements. We need capacity building: it’s all brand new for us in Brazil, and there are many challenges. One of them is the participation of women in these discussions. One of the talks that was very special at International Data Week this year was Christine Borgman’s. She had this very broad perspective, a holistic view on open research data. I expect for the future we can see more and more women engaged and have an active voice.”

Humans of Data 7

img_3843-edit
“I entered into the data profession about three and a half years ago. I found the community to be very welcoming. The ideas of ethics and sustainability are starting to be brought forward more strongly now. Data aren’t just digits in the memory. They have real world effects in real world situations.

One of the things that drew me particularly to the idea of preserving data, is to build on the research investments that people have made. People spend their lives exploring questions. If the information and data those answers are based on, aren’t kept useable in an understandable way, then the answers themselves are also lost. The end result is so many wasted lives when you add it up. It’s the time invested in the exploring these questions, but even more in a broadly humanitarian way, these answers are pursued to improve the lot of humanity. If the data collected through research are lost, the answers themselves are lost, and so the people, the environmental effects are also lost. So I think that’s my most important concern.

Look, I like efficiency. I like effectiveness. Not taking care of things you’ve spent time making, not making sure they can be used effectively – that’s a waste of everyone’s time and effort. It just bugs me. Data is the starting point for any answers we achieve through research. Let’s not waste that effort. If there’s anything this community could respond more in, it’s the human-related areas – the marketing and advertising of the importance of data and the importance of making sure the data is there to go back to. There’s no reason to reinvent wheels, but improving them is vital.”

Hackathon on open research data: encouraging a balanced demand and supply of open research data

This post comes from Professor Joseph Muliaro Wafula, Director of ICT Centre of Excellence and Open Data (iCEOD) of the Jomo Kenyatta University of Agriculture and Technology, chair of CODATA Kenya and a member of the CODATA International Executive Committee. iCEOD is collaborating with IBM Cloud on an open data cloud platform:

“We wanted to maximize the value of our datasets—both to save money and, crucially, to encourage innovation and collaboration in the wider community,” says Professor Wafula.

“If we could take our data outside the confines of academia and allow developers and social entrepreneurs to harness it, we hope that they would start building applications to use this information for the public good.”

hack5On Saturday 12 November, the ICT Centre of Excellence and Open Data (iCEOD) of the Jomo Kenyatta University of Agriculture and Technology organized a hackathon on Kenyan open research data sets. The hackathon was intended to enable public access and use of research data through creation of mobile and web applications. It targeted research datasets from scientists in agriculture and public health. The hackathon was opened by the Dean School of Computing and Information Technology, Dr. Stephen Kimani who commended students for their readiness to participate in finding solutions to some of the existing challenges.

  1. Prof Mary Abukutsa (http://www.jkuat.ac.ke/departments/horticulture/academic/) provided research data set on nutritional value, germination and yields of indigenous African vegetables.
  2. Prof John Wesonga (http://www.jkuat.ac.ke/departments/horticulture/prof-john-wesonga-m/) provided research data set on best conditions of growing French beans with minimum application of pesticides.
  3. Prof. Simon Karanja (http://www.jkuat.ac.ke/schools/soph/prof-simon-karanja/) provided research data on the effects of Khat on users in Kenya.
  4. Dr. Frida Wanzala (http://www.jkuat.ac.ke/departments/horticulture/academic/) provided research data on papaya growing areas in Kenya.
  5. Dr. Peter Kahenya(http://www.jkuat.ac.ke/departments/foodscience/?page_id=49) provided research data on various conditions that affect cooking time of different types of beans.
  6. Dr. John Kinyuru (http://www.jkuat.ac.ke/departments/foodscience/?page_id=49) provided research data on nutritional formulas of various local Kenyan foods.
  7. Mr. Francis Ombwara (http://www.jkuat.ac.ke/departments/horticulture/non-academic/) provided research data on identification of sweetness of papaya by color. All the developers were invited to explain the objective of their research.

hack2The iCEOD team curated the data sets and organized it in formats that were easily consumable by software developers. The team also created Application Programming Interfaces (API’s) and SQL queries which were uploaded to a database in a local computer environment, allowing developers to delve into development of applications without having to worry about the raw datasets.

hack4The objectives of the Hackathon

  1. To build innovative mobile and web applications that made access and consumption of research data easy for the benefit of the society.
  2. To encourage scientists to open their research data for public consumption and use.
  3. To showcase open data capability in providing innovative solutions to societal challenges
  4. To engage partners support on the open research data initiative.

Hackathon Partners.

  1. Jomo Kenyatta University of Agriculture and Technology (http://www.jkuat.ac.ke/)
  2. IBM East Africa (http://www.ibm.com/connect/ibm/ke/en/branch/ibm.html )
  3. CODATA (http://www.codata.org/ )
  4. AFRICA ai JAPAN Project (http://jkuat.ac.ke/projects/africa-ai-japan/?page_id=469)
  5. Kenya Open Data Initiative (https://opendata.go.ke/ )

List of Judges

  1. Dr. Agnes Mindila-JKUAT
  2. Mr. Philip Oyier-JKUAT
  3. Mr. Silas Macharia –IBM Kenya
  4. Mr. Phebian –ICT Authority

The three winning applications

hack1Position 1: Team ACELORDS comprised of Alex Maina, Emmah Kimari and Peter Kamore – Used Nutritional dataset

  • This group developed a web and mobile application that would help know the nutritional value of the different common types of foods in Kenya.
  • A healthy diet ensures boosted immunity against most diseases, thus such a system becomes beneficial.
  • The user has the option of selecting any combination of foods offered and have the application calculate the resultant nutritional values of the selected food combination.
  • As an addition to the system functions, it has the ability to save choices made by a registered or logged in user. With this information, critical data such as weekly trends or monthly trends can be generated, thereby allowing further processing and interpretation of results.
  • A user can be alerted when the system senses potential health risks, such as higher or lower levels beyond thresholds of certain nutrients. With such information, the user can change his/her diet to a trend that would be beneficial to one’s health.
  • This application can be used by mothers to monitor their baby’s health trends and focus towards nurturing healthy children which is one of the major concerns in Africa.

Position 2: Nahayo B. Patrick and Pius Dan Nyongesa – Used Khat dataset

  • This group created an informational application.
    • This dataset was large with the different effects of Khat in Kenya such as the effects of Khat on local citizens, population that is taking khat, financial expenditure on the users of khat, the age of those involved in taking the khat, where khat grows among other issues.
    • This group created a real-time data analysis tool that takes this data and plots charts and graphs making easy to visualize and gain insights on the data.
    • Potential users of such applications are health service providers, counselors, security personnel, among others.

Position 3: Team ELITE (used Indigenous African Vegetables data)

  • This dataset shows the nutritional value of the different Kenyan indigenous vegetables and also the areas where this vegetables are commonly found. This dataset was used to develop an application that can help any user to determine which vegetables best suit their diets depending on the nutritional content they need most.
  • The application shows where you can find different vegetables and also which vegetables they require most depending on their need of nutrition.
  • Potential end users include Manufactures, Processors, Nutritionist, Health Providers, Nursing Mothers

All data sets are being prepared for registration so as to be assigned DOI. There APIs will be published on the JKUAT open Data Platform at https://opendata.jkuat.ac.ke/ when ready.

Acknowledgement

Jomo Kenyattta University of Agriculture and Technology, Prof Muliaro Wafula –Director iCEOD, Dr Simon Hudson-CODATA Executive Director, Prof Manabu Tsunoda- AFRICA ai JAPAN Project Chief Adviser, Silas Macharia-IBM Kenya,  Dr Agnes Mindila –Lecturer Computing Department & iCEOD Collaborator, Noriaki Tanaka — AFRICA ai JAPAN Project Coordinator, Pascal Ouma –Deputy Director ICT,  Francis Musyoki – iCEOD attached & MSc Student Software Engineering at JKUAT, Tom Nyongesa- iCEOD attached & Computer Science Final Year students JKUAT, Gyle Odhiambo- iCEOD attached & Computer Science Final Year students JKUA, and Alice Ebela –iCEOD Administrative Assistant .

List of Participants

1 Mwangi Shadrack 2 Nixon Thuo 3 Dominic Kithinji
4 Wycliff Obuya 5 Sydney Mainga 6 Enock Chesire
7 Alex Maina 8 Winnie Karanja 9 Collins Njoroge
10 Charles Wachira 11 John Makau 12 Winnie Karanja
13 Collins Njoroge 14 Charles Wachira 15 Kyalo Ian
16 Okumu Ian 17 Joseph Njenga 18 Irene Kimani
19 Simon Mulwa 20 Ancentus Makau 21 Emmah Kimari
22 Peter Kamoro 23 Polycap Okeyo 24 Khwolo Kabara
25 Chris Mureithi 26 Stephen Mwangi 27 Jimmy Koskei
28 Bashir Shelkh 29 Gaylord Odhiambo 30 Charles Waitiki
31 Mary Waweru 32 Rose Kinuthia 33 Vicky Gatobu
34 Jeremiah Kuria 35 Mercy Maina 36 Njoroge Daniel
37 Muthiani Jayson 38 Sammy Mugambi 39 Peter Kariuki
40 Alan Mwathe 41 Gitau Isaac 42 Daniel Biwott
43 Ngumo Nthenge 43 Gitau Moses 44 Njoroge Edward
45 Ambrose Mbae 46 Stephen Oduor 47 Elvis Shida
48 Mwembe Emmanuel 49 Waithaka Kennedy 50 Tabitha Akinyi
51 Brian Kimathi 52 Lusenaka Alvin 53 Kamau Karogo
54 Nechewnje Ian 55 Daniel Ondigo 56 Nyongesa K Tom
57 Kevin Ruo 58 Christopher Mulwa 59 Ngacha Duncan
60 Mark Ngatia 61 Sarah Waiganjo 62 Eugene Ogongo
63 Jacob Kenneth 65 Pius Nyongesa

Humans of Data 6

img_3939-edit“I’m passionate about the transfer we’re seeing in research: moving from a cottage industry to a place where knowledge is increasingly coming through trusted processes.  Research data will be an output that can be used by lots of people.  The problem we have in research is that lots of people can’t use the data.  If we can create a trusted environment we can make a big difference to the way data is used.

Look, I’m old. I’m 62, and yet I’m passionate and I don’t want to give this up whilst this change is happening. I want to help get this set up for the next phase.  We can make it so that research data is much more available.  Our mission is to make research data more available for researchers, research institutions, the nation, and the global community.  This means that every day you jump from astronomy to history to social sciences, and you have to think about why it’s valuable to different sorts of people.  If you think about this problem in the right way, yes, you have to have technical support for this, but the heart of this is that you have to have trust.  That’s how you get things to happen.  So I measure data in trust rather than petabytes.  I measure data in people rather than petabytes.”

HarassMap at SciDataCon on their Data Management Project with IDRC

This post comes from Reem Wael, Director of HarassMap http://harassmap.org/en/: Reem was assisted in part by CODATA and GEO to attend SciDataCon and contributed a paper to a session on ‘Data Sharing in a Development Context: The experience of the IDRC Data Sharing Pilot’ http://www.scidatacon.org/2016/sessions/56/

hm_logo_05HarassMap launched five years ago with the mission to end social acceptability of sexual harassment in Egypt. This mission, unexpectedly, led to the accumulation of a lot of data coming from both online and offline sources, and the more we grow the more data we have. Our methodology is to combine online and offline work to achieve our mission and therefore we crowdsource reports on sexual harassment and through our social media outlets, and we receive information from outreach activities and trainings. We analyze this information and give it back to the community in the form of research reports, public campaigns, trainings and policies.

A few years ago, we started receiving requests from researchers who are working on topics that cross-cut with sexual harassment, to access our data. We responded to these requests by providing an excel sheet with the downloaded crowdsourced reports, but this was the limit of our assistance. When an opportunity came along to design and implement a data management plan, supported by IDRC, it was very relevant to our needs. IDRC is an international research organization and was interested in exploring how grantees can have their data more open to the public.

The main point that IDRC focused on is openly sharing the data. However, when we started to work on the project, we realized that the earlier stages are more challenging; which data do we store and how? We have a massive amount of data accumulating in the last five years. Other than the crowdsourced reports and the reports that we receive on social media, we also own a huge library of photographs and video footage, reports from trainings, evaluations from trainings that reflect the impact that we had, reports from outreach activities, and social media posts and replies. We reached some decisions in the planning phase and we are continuing to make these decisions as we move on.

We formed a ‘data management team’ from HarassMap staff who works on research and data and we tried to identify the data that we want to collect, organize and share, raising the following questions: why are we sharing data, and with whom? How can we organize it in a way that would be helpful to researchers, or others who request access to the data? Are there any ethical issues that we need to consider while sharing the data? These questions brought up some challenges. We were not sure what kind of data would be interesting to researchers, for instance. We found that even though crowdsourced reports are more coveted by researchers the more interesting data is the discussions on social media (our posts, including all the comments that we get) in addition to field reports. This data mirrors and tracks the development of myths and misconceptions of sexual harassment, especially when analyzed over a long span of time as it can show if a difference in attitudes and opinions on sexual harassment had occurred.

Embarking upon data management showed some challenges as well. One is a linguistic/technical challenge especially with the crowdsourced reports as we receive them in both English and Arabic. Privacy was a challenge regarding HarassMap’s library of photos and videos since it shows a lot of volunteers since 2010 from whom we did not take consent to share their photos publicly. We did not find an ethical problem with publishing and sharing crowdsourced reports because they are all anonymous, and we also filter them to remove any information that can hold us legally liable such as accusations against people or places by name.

That said, we are now in the process of accumulating and organizing data from the last five years, and putting it on our web server. The next phase – sharing – has its share of challenges. The first and most important is that we must have some kind of screening over who uses our data for various reasons; sometimes researchers completely misuse the data which puts HarassMap in a bad position. For instance claiming that crowdsourced reports reflect ‘hotspots’ of sexual harassment is essentially flawed yet a widely used claim. We always assert that crowdsourced reports provide biased data because there is a huge difference to access to internet and technology based on the affluence of the area and therefore receiving reports from a specific area doesn’t necessarily mean that harassment is more prevalent there, it may mean that people have better access and more knowledge about reporting. At other times, researchers have taken the data without giving credit to HarassMap; and some researchers have asked for the data and then disappeared without informing us of what they wrote.

Being part of this project has benefited HarassMap greatly not only because we started thinking about the idea of sharing our data on searchable engine, but also because we did not know the amount of data that we possess in the first place until we started looking for it. While putting our data completely public is something that HarassMap is still hesitant to do, we are definitely happy to provide researchers and other interested parties with data in a format that is more user friendly.hm