This post comes from Reem Wael, Director of HarassMap http://harassmap.org/en/: Reem was assisted in part by CODATA and GEO to attend SciDataCon and contributed a paper to a session on ‘Data Sharing in a Development Context: The experience of the IDRC Data Sharing Pilot’ http://www.scidatacon.org/2016/sessions/56/
HarassMap launched five years ago with the mission to end social acceptability of sexual harassment in Egypt. This mission, unexpectedly, led to the accumulation of a lot of data coming from both online and offline sources, and the more we grow the more data we have. Our methodology is to combine online and offline work to achieve our mission and therefore we crowdsource reports on sexual harassment and through our social media outlets, and we receive information from outreach activities and trainings. We analyze this information and give it back to the community in the form of research reports, public campaigns, trainings and policies.
A few years ago, we started receiving requests from researchers who are working on topics that cross-cut with sexual harassment, to access our data. We responded to these requests by providing an excel sheet with the downloaded crowdsourced reports, but this was the limit of our assistance. When an opportunity came along to design and implement a data management plan, supported by IDRC, it was very relevant to our needs. IDRC is an international research organization and was interested in exploring how grantees can have their data more open to the public.
The main point that IDRC focused on is openly sharing the data. However, when we started to work on the project, we realized that the earlier stages are more challenging; which data do we store and how? We have a massive amount of data accumulating in the last five years. Other than the crowdsourced reports and the reports that we receive on social media, we also own a huge library of photographs and video footage, reports from trainings, evaluations from trainings that reflect the impact that we had, reports from outreach activities, and social media posts and replies. We reached some decisions in the planning phase and we are continuing to make these decisions as we move on.
We formed a ‘data management team’ from HarassMap staff who works on research and data and we tried to identify the data that we want to collect, organize and share, raising the following questions: why are we sharing data, and with whom? How can we organize it in a way that would be helpful to researchers, or others who request access to the data? Are there any ethical issues that we need to consider while sharing the data? These questions brought up some challenges. We were not sure what kind of data would be interesting to researchers, for instance. We found that even though crowdsourced reports are more coveted by researchers the more interesting data is the discussions on social media (our posts, including all the comments that we get) in addition to field reports. This data mirrors and tracks the development of myths and misconceptions of sexual harassment, especially when analyzed over a long span of time as it can show if a difference in attitudes and opinions on sexual harassment had occurred.
Embarking upon data management showed some challenges as well. One is a linguistic/technical challenge especially with the crowdsourced reports as we receive them in both English and Arabic. Privacy was a challenge regarding HarassMap’s library of photos and videos since it shows a lot of volunteers since 2010 from whom we did not take consent to share their photos publicly. We did not find an ethical problem with publishing and sharing crowdsourced reports because they are all anonymous, and we also filter them to remove any information that can hold us legally liable such as accusations against people or places by name.
That said, we are now in the process of accumulating and organizing data from the last five years, and putting it on our web server. The next phase – sharing – has its share of challenges. The first and most important is that we must have some kind of screening over who uses our data for various reasons; sometimes researchers completely misuse the data which puts HarassMap in a bad position. For instance claiming that crowdsourced reports reflect ‘hotspots’ of sexual harassment is essentially flawed yet a widely used claim. We always assert that crowdsourced reports provide biased data because there is a huge difference to access to internet and technology based on the affluence of the area and therefore receiving reports from a specific area doesn’t necessarily mean that harassment is more prevalent there, it may mean that people have better access and more knowledge about reporting. At other times, researchers have taken the data without giving credit to HarassMap; and some researchers have asked for the data and then disappeared without informing us of what they wrote.
Being part of this project has benefited HarassMap greatly not only because we started thinking about the idea of sharing our data on searchable engine, but also because we did not know the amount of data that we possess in the first place until we started looking for it. While putting our data completely public is something that HarassMap is still hesitant to do, we are definitely happy to provide researchers and other interested parties with data in a format that is more user friendly.