Author Archives: codata_blog

Humans of Data 7

“I entered into the data profession about three and a half years ago. I found the community to be very welcoming. The ideas of ethics and sustainability are starting to be brought forward more strongly now. Data aren’t just digits in the memory. They have real world effects in real world situations.

One of the things that drew me particularly to the idea of preserving data, is to build on the research investments that people have made. People spend their lives exploring questions. If the information and data those answers are based on, aren’t kept useable in an understandable way, then the answers themselves are also lost. The end result is so many wasted lives when you add it up. It’s the time invested in the exploring these questions, but even more in a broadly humanitarian way, these answers are pursued to improve the lot of humanity. If the data collected through research are lost, the answers themselves are lost, and so the people, the environmental effects are also lost. So I think that’s my most important concern.

Look, I like efficiency. I like effectiveness. Not taking care of things you’ve spent time making, not making sure they can be used effectively – that’s a waste of everyone’s time and effort. It just bugs me. Data is the starting point for any answers we achieve through research. Let’s not waste that effort. If there’s anything this community could respond more in, it’s the human-related areas – the marketing and advertising of the importance of data and the importance of making sure the data is there to go back to. There’s no reason to reinvent wheels, but improving them is vital.”

Hackathon on open research data: encouraging a balanced demand and supply of open research data

This post comes from Professor Joseph Muliaro Wafula, Director of ICT Centre of Excellence and Open Data (iCEOD) of the Jomo Kenyatta University of Agriculture and Technology, chair of CODATA Kenya and a member of the CODATA International Executive Committee. iCEOD is collaborating with IBM Cloud on an open data cloud platform:

“We wanted to maximize the value of our datasets—both to save money and, crucially, to encourage innovation and collaboration in the wider community,” says Professor Wafula.

“If we could take our data outside the confines of academia and allow developers and social entrepreneurs to harness it, we hope that they would start building applications to use this information for the public good.”

On Saturday 12 November, the ICT Centre of Excellence and Open Data (iCEOD) of the Jomo Kenyatta University of Agriculture and Technology organized a hackathon on Kenyan open research data sets. The hackathon was intended to enable public access and use of research data through creation of mobile and web applications. It targeted research datasets from scientists in agriculture and public health. The hackathon was opened by the Dean School of Computing and Information Technology, Dr. Stephen Kimani who commended students for their readiness to participate in finding solutions to some of the existing challenges.

Prof Mary Abukutsa (http://www.jkuat.ac.ke/departments/horticulture/academic/) provided research data set on nutritional value, germination and yields of indigenous African vegetables.
Prof John Wesonga (http://www.jkuat.ac.ke/departments/horticulture/prof-john-wesonga-m/) provided research data set on best conditions of growing French beans with minimum application of pesticides.
Prof. Simon Karanja (http://www.jkuat.ac.ke/schools/soph/prof-simon-karanja/) provided research data on the effects of Khat on users in Kenya.
Dr. Frida Wanzala (http://www.jkuat.ac.ke/departments/horticulture/academic/) provided research data on papaya growing areas in Kenya.
Dr. Peter Kahenya(http://www.jkuat.ac.ke/departments/foodscience/?page_id=49) provided research data on various conditions that affect cooking time of different types of beans.
Dr. John Kinyuru (http://www.jkuat.ac.ke/departments/foodscience/?page_id=49) provided research data on nutritional formulas of various local Kenyan foods.
Mr. Francis Ombwara (http://www.jkuat.ac.ke/departments/horticulture/non-academic/) provided research data on identification of sweetness of papaya by color. All the developers were invited to explain the objective of their research.

The iCEOD team curated the data sets and organized it in formats that were easily consumable by software developers. The team also created Application Programming Interfaces (API’s) and SQL queries which were uploaded to a database in a local computer environment, allowing developers to delve into development of applications without having to worry about the raw datasets.

The objectives of the Hackathon

To build innovative mobile and web applications that made access and consumption of research data easy for the benefit of the society.
To encourage scientists to open their research data for public consumption and use.
To showcase open data capability in providing innovative solutions to societal challenges
To engage partners support on the open research data initiative.

Hackathon Partners.

Jomo Kenyatta University of Agriculture and Technology (http://www.jkuat.ac.ke/)
IBM East Africa (http://www.ibm.com/connect/ibm/ke/en/branch/ibm.html )
CODATA (http://www.codata.org/ )
AFRICA ai JAPAN Project (http://jkuat.ac.ke/projects/africa-ai-japan/?page_id=469)
Kenya Open Data Initiative (https://opendata.go.ke/ )

List of Judges

Dr. Agnes Mindila-JKUAT
Mr. Philip Oyier-JKUAT
Mr. Silas Macharia –IBM Kenya
Mr. Phebian –ICT Authority

The three winning applications

Position 1: Team ACELORDS comprised of Alex Maina, Emmah Kimari and Peter Kamore – Used Nutritional dataset

This group developed a web and mobile application that would help know the nutritional value of the different common types of foods in Kenya.
A healthy diet ensures boosted immunity against most diseases, thus such a system becomes beneficial.
The user has the option of selecting any combination of foods offered and have the application calculate the resultant nutritional values of the selected food combination.
As an addition to the system functions, it has the ability to save choices made by a registered or logged in user. With this information, critical data such as weekly trends or monthly trends can be generated, thereby allowing further processing and interpretation of results.
A user can be alerted when the system senses potential health risks, such as higher or lower levels beyond thresholds of certain nutrients. With such information, the user can change his/her diet to a trend that would be beneficial to one’s health.
This application can be used by mothers to monitor their baby’s health trends and focus towards nurturing healthy children which is one of the major concerns in Africa.

Position 2: Nahayo B. Patrick and Pius Dan Nyongesa – Used Khat dataset

This group created an informational application.
• This dataset was large with the different effects of Khat in Kenya such as the effects of Khat on local citizens, population that is taking khat, financial expenditure on the users of khat, the age of those involved in taking the khat, where khat grows among other issues.
• This group created a real-time data analysis tool that takes this data and plots charts and graphs making easy to visualize and gain insights on the data.
• Potential users of such applications are health service providers, counselors, security personnel, among others.

Position 3: Team ELITE (used Indigenous African Vegetables data)

This dataset shows the nutritional value of the different Kenyan indigenous vegetables and also the areas where this vegetables are commonly found. This dataset was used to develop an application that can help any user to determine which vegetables best suit their diets depending on the nutritional content they need most.
The application shows where you can find different vegetables and also which vegetables they require most depending on their need of nutrition.
Potential end users include Manufactures, Processors, Nutritionist, Health Providers, Nursing Mothers

All data sets are being prepared for registration so as to be assigned DOI. There APIs will be published on the JKUAT open Data Platform at https://opendata.jkuat.ac.ke/ when ready.

Acknowledgement

Jomo Kenyattta University of Agriculture and Technology, Prof Muliaro Wafula –Director iCEOD, Dr Simon Hudson-CODATA Executive Director, Prof Manabu Tsunoda- AFRICA ai JAPAN Project Chief Adviser, Silas Macharia-IBM Kenya, Dr Agnes Mindila –Lecturer Computing Department & iCEOD Collaborator, Noriaki Tanaka — AFRICA ai JAPAN Project Coordinator, Pascal Ouma –Deputy Director ICT, Francis Musyoki – iCEOD attached & MSc Student Software Engineering at JKUAT, Tom Nyongesa- iCEOD attached & Computer Science Final Year students JKUAT, Gyle Odhiambo- iCEOD attached & Computer Science Final Year students JKUA, and Alice Ebela –iCEOD Administrative Assistant .

List of Participants

1	Mwangi Shadrack	2	Nixon Thuo	3	Dominic Kithinji
4	Wycliff Obuya	5	Sydney Mainga	6	Enock Chesire
7	Alex Maina	8	Winnie Karanja	9	Collins Njoroge
10	Charles Wachira	11	John Makau	12	Winnie Karanja
13	Collins Njoroge	14	Charles Wachira	15	Kyalo Ian
16	Okumu Ian	17	Joseph Njenga	18	Irene Kimani
19	Simon Mulwa	20	Ancentus Makau	21	Emmah Kimari
22	Peter Kamoro	23	Polycap Okeyo	24	Khwolo Kabara
25	Chris Mureithi	26	Stephen Mwangi	27	Jimmy Koskei
28	Bashir Shelkh	29	Gaylord Odhiambo	30	Charles Waitiki
31	Mary Waweru	32	Rose Kinuthia	33	Vicky Gatobu
34	Jeremiah Kuria	35	Mercy Maina	36	Njoroge Daniel
37	Muthiani Jayson	38	Sammy Mugambi	39	Peter Kariuki
40	Alan Mwathe	41	Gitau Isaac	42	Daniel Biwott
43	Ngumo Nthenge	43	Gitau Moses	44	Njoroge Edward
45	Ambrose Mbae	46	Stephen Oduor	47	Elvis Shida
48	Mwembe Emmanuel	49	Waithaka Kennedy	50	Tabitha Akinyi
51	Brian Kimathi	52	Lusenaka Alvin	53	Kamau Karogo
54	Nechewnje Ian	55	Daniel Ondigo	56	Nyongesa K Tom
57	Kevin Ruo	58	Christopher Mulwa	59	Ngacha Duncan
60	Mark Ngatia	61	Sarah Waiganjo	62	Eugene Ogongo
63	Jacob Kenneth	65	Pius Nyongesa

Humans of Data 6

“I’m passionate about the transfer we’re seeing in research: moving from a cottage industry to a place where knowledge is increasingly coming through trusted processes. Research data will be an output that can be used by lots of people. The problem we have in research is that lots of people can’t use the data. If we can create a trusted environment we can make a big difference to the way data is used.

Look, I’m old. I’m 62, and yet I’m passionate and I don’t want to give this up whilst this change is happening. I want to help get this set up for the next phase. We can make it so that research data is much more available. Our mission is to make research data more available for researchers, research institutions, the nation, and the global community. This means that every day you jump from astronomy to history to social sciences, and you have to think about why it’s valuable to different sorts of people. If you think about this problem in the right way, yes, you have to have technical support for this, but the heart of this is that you have to have trust. That’s how you get things to happen. So I measure data in trust rather than petabytes. I measure data in people rather than petabytes.”

HarassMap at SciDataCon on their Data Management Project with IDRC

This post comes from Reem Wael, Director of HarassMap http://harassmap.org/en/: Reem was assisted in part by CODATA and GEO to attend SciDataCon and contributed a paper to a session on ‘Data Sharing in a Development Context: The experience of the IDRC Data Sharing Pilot’ http://www.scidatacon.org/2016/sessions/56/

HarassMap launched five years ago with the mission to end social acceptability of sexual harassment in Egypt. This mission, unexpectedly, led to the accumulation of a lot of data coming from both online and offline sources, and the more we grow the more data we have. Our methodology is to combine online and offline work to achieve our mission and therefore we crowdsource reports on sexual harassment and through our social media outlets, and we receive information from outreach activities and trainings. We analyze this information and give it back to the community in the form of research reports, public campaigns, trainings and policies.

A few years ago, we started receiving requests from researchers who are working on topics that cross-cut with sexual harassment, to access our data. We responded to these requests by providing an excel sheet with the downloaded crowdsourced reports, but this was the limit of our assistance. When an opportunity came along to design and implement a data management plan, supported by IDRC, it was very relevant to our needs. IDRC is an international research organization and was interested in exploring how grantees can have their data more open to the public.

The main point that IDRC focused on is openly sharing the data. However, when we started to work on the project, we realized that the earlier stages are more challenging; which data do we store and how? We have a massive amount of data accumulating in the last five years. Other than the crowdsourced reports and the reports that we receive on social media, we also own a huge library of photographs and video footage, reports from trainings, evaluations from trainings that reflect the impact that we had, reports from outreach activities, and social media posts and replies. We reached some decisions in the planning phase and we are continuing to make these decisions as we move on.

We formed a ‘data management team’ from HarassMap staff who works on research and data and we tried to identify the data that we want to collect, organize and share, raising the following questions: why are we sharing data, and with whom? How can we organize it in a way that would be helpful to researchers, or others who request access to the data? Are there any ethical issues that we need to consider while sharing the data? These questions brought up some challenges. We were not sure what kind of data would be interesting to researchers, for instance. We found that even though crowdsourced reports are more coveted by researchers the more interesting data is the discussions on social media (our posts, including all the comments that we get) in addition to field reports. This data mirrors and tracks the development of myths and misconceptions of sexual harassment, especially when analyzed over a long span of time as it can show if a difference in attitudes and opinions on sexual harassment had occurred.

Embarking upon data management showed some challenges as well. One is a linguistic/technical challenge especially with the crowdsourced reports as we receive them in both English and Arabic. Privacy was a challenge regarding HarassMap’s library of photos and videos since it shows a lot of volunteers since 2010 from whom we did not take consent to share their photos publicly. We did not find an ethical problem with publishing and sharing crowdsourced reports because they are all anonymous, and we also filter them to remove any information that can hold us legally liable such as accusations against people or places by name.

That said, we are now in the process of accumulating and organizing data from the last five years, and putting it on our web server. The next phase – sharing – has its share of challenges. The first and most important is that we must have some kind of screening over who uses our data for various reasons; sometimes researchers completely misuse the data which puts HarassMap in a bad position. For instance claiming that crowdsourced reports reflect ‘hotspots’ of sexual harassment is essentially flawed yet a widely used claim. We always assert that crowdsourced reports provide biased data because there is a huge difference to access to internet and technology based on the affluence of the area and therefore receiving reports from a specific area doesn’t necessarily mean that harassment is more prevalent there, it may mean that people have better access and more knowledge about reporting. At other times, researchers have taken the data without giving credit to HarassMap; and some researchers have asked for the data and then disappeared without informing us of what they wrote.

Being part of this project has benefited HarassMap greatly not only because we started thinking about the idea of sharing our data on searchable engine, but also because we did not know the amount of data that we possess in the first place until we started looking for it. While putting our data completely public is something that HarassMap is still hesitant to do, we are definitely happy to provide researchers and other interested parties with data in a format that is more user friendly.

Report to ICTP, Trieste CODATA-RDA Research Data Science Summer School (1st – 12th August, 2016)

This post was written by Elias P.M. Mwakilama, Serving as the deputy head and Coordinator of Research, Seminar and Consultancies in the Mathematical Sciences Department at University of Malawi-Chancellor College, Elias Mwakilama is a young computational and applied mathematician in the field of operations research. Elias recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy – his participation was kindly supported by ACU.

This report seeks to highlight key beneficial outcomes and recommendations based on the international summer on CODATA-RDA research data science school that I attended at ICTP, Trieste-Italy from 1st to 12th August, 2016 with financial support on travelling and lodging from the Association of Commonwealth Universities (ACU).

First and foremost, I must sincerely thank the school organizing committee for providing a positive answer to my application and for taking such an initiative to seek funding opportunities from ACU. I also thank ACU for providing such funding. I do not take that for granted.

Overall, the summer school was well organized as it brought together early career leading scientists across the world to discuss scientific ideas on importance of data sharing in scientific research from computational and applied statistics perspective in enhancing the scientific and technological capacity of developing countries.

Considering the fact that there exist issues of ethics and confidentiality in data handling, we need to seek means of protecting such data and promote spirit of sharing and publishing among us research scientists so as to cut down costs of replication or duplication of data collection and analysis processes. As such, it is relevant to seek for the right platforms for ensuring quality data handling and sharing in form of well-organized schools such as ICTP CODATA, networking and collaborations. Across the globe, mainly in Africa, financial support for early leading career scientist to attend to or participate in such research platforms and events remains a barrier. However, provision of research funds from scientific institutions such as ICTP-Trieste campus and The Association of Commonwealth Universities assists such financially challenged young leading scientists who consider internationally and locally organized research schools as a platform of the great learning and development of their opportunities. It is through support from institutions such ACU that I had access to the CODATA-RDA research school in my field and the chance to meet influential people who have shown interest to collaborate with and share data for scientific research and policy analysis.

By working closely with leading academics and invited guests at such school, I was exposed to a number of data programming skills which are of genuine and practical importance to my field of computational and applied mathematics in operations research. I have already begun using such skills in my academic research and also when teaching undergraduate students in our institution, Chancellor College. For instance, programming languages such as R-studio and Open source languages have aided me in successfully handling my fourth undergraduate course in Mathematics research.

Besides, being a pioneer of a newly established Mathematics and Statistics research group, Fibonacci Research Group (FRG) in our department (Mathematical Sciences), the summer school has assisted both me and the group in establishing research links with other academic applied statisticians which can develop far beyond the problem posed by the industry in Malawi to our existing research group. Leadership skills learnt from the summer school by observing the way the moderators coordinated their work on the problem and skills in presenting research material and in scientific communication would be used in coordinating research projects in the research group for the entire group benefits.

The summer school has also provided me with an opportunity to work on problems of genuine practical importance and do good computational mathematics in the process of designing new research areas that can be opened up leading to publications and new research collaborations. Skills acquired from the school would also provide an opportunity to our research group by applying knowledge and skills to significant practical problems and then stimulate in industry the awareness of the power of data sharing in statistical modeling and scientific computing.

With these many remarks, I therefore encourage your office to continue supporting young African early scientists through provision of similar financial support so as to develop our continent, in particular developing countries such as Malawi. I also wish to request for an idea of establishing more data sharing and analysis scientific centres such as ICTP in Africa where more early career research scientists such as myself could be able to get similar trainings at low costs and easier. I attach a certificate of participation that I got from the ICTP-CODATA RDA School of research science as an evidence of my full attendance and participation at the school.

PASTD – Task group report of SciDataCon2016 Activities

Continuation of Task Group approved by CODATA General Assembly
On behalf CODATA PASTD, Dr. Xiang ZHOU attended the CODATA General Assembly in Denver on 11^th September 2016. Dr. Zhou gave a 5-minute presentation on the CODATA PASTD Task Group’s activities over the past two years. He also presented the updated objectives, action plan, and expected cooperation with multiple stakeholders in developing and developed countries. The updated CODATA PASTD action plan is a response to the ‘Open Data in a Big Data World’ International Accord issued by ICSU, IAP, ISSC and TWAS. In next two years, CODATA PASTD activities will pay more attention to open data policies, best practices and capacity building in low and middle income countries.CODATA PASTD was on of eight Task Groups approved by the General Assembly (from 15 proposals). The renewal of PASTD as a CODATA Task Group was in line with the CODATA Executive Committee’s recommendations and reflects its wide partnerships, past achievements and high priority activities which are line with the CODATA Strategic Plan.
Key Issues and Action Plan for the Preservation and Access of Research Data in Developing Countries – Panel Discussion Session in SciDataCon 2016 in Denver
At SciDataCon 2016, held as part of International Data Week in Denver, 11-17 September 2016, the CODATA PASTD Task Group organized a panel discussion to address key Issues and an action plan for the preservation and access of research data in developing countries.The group was held in the afternoon of 13 September and chaired by Dr. Xiang Zhou, the co-chair of the PASTD Task Group. Dr. Zhou made welcome speech and gave a brief introduction about the CODATA PASTD Task Group and practices of preservation of and access to research data in developing countries. The session provided a forum for participants from many disciplines to exchange ideas about key Issues on data preservation, access and sharing of research data in developing countries. The session consisted of seven invited talks covering issues of policy, technologies, capacity building and best practices, followed by an open discussion (Table.1).

Wim Hugo, from South African Environmental Observation Network (SAEON), presented the concept of a Network Data Centre for Africa.
Joseph Muliaro Wafula, from ICT Centre of Excellence and Open Data (iCEOD), Jomo Kenyatta University of Agriculture and Technology (JKUAT), shared the experiences of developing an open data policy and infrastructure taking JKUAT as a case study.
Daisy Selematsela from the Knowledge Management Corporate of the National Research Foundation, South Africa, presented developments and transition of open data access in Africa, and challenges and sustainability of Open Access.
Paul Uhlir, as a consultant in research data policy and management, and Scholar of the National Academy of Sciences, described recent developments in open data policies and discussed data sharing and data management principles in developing countries.
Kostiantyn Yefremov, Director of the World Data Center for Geoinformatics and Sustainable Development, National Technical University of Ukraine, described the development of the interdisciplinary research infrastructure in Ukraine.
Yuanqiang ZHU, Professor of the Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences (IGSNRR/CAS), discussed publishing and sharing research data for sciences and sustainability in developing countries, and the practice of the Global Change Research Data Publishing and Repository.
Maria Natalia Norori from Universidad Latina de Costa Rica, analyzed the benefits, development and challenges of Open Data Empowerment. There is a consensus that Open data is an important weapon to ﬁght the educational barriers imposed by socioeconomic factors in developing countries.
Furthermore, participants discussed challenges, policies and actions for implementation of “Nairobi Data Sharing Principles” in developing countries, especially in LMICs. Implementation guidelines for data sharing principles can help promote the infrastructures and the development of data sharing capacity and best practices in future years.

The following tasks are given priority on implementation of task group objectives：

Increase awareness and consensus on Nairobi Guidelines / Open Data Principles in Developing Countries;
Workshop and training for francophone countries, proposed in Madagascar in 2017;
Special issue to the Journal of Data Science: best practices and show cases of implementing Open Data in a Big Data World in developing countries;
Enhancement of the online services of the global change research data publishing and repository

Table 1．Agenda of PASTD Session in SciDataCon

Key Issues and Action Plan for the Preservation and Access of Research Data in Developing Countries

13 September 2016
Agenda
Chaired by: Xiang ZHOU

	Topic	Who
1	Welcome and Introductions	Chair	5’
2	Invited Talks: Concept – Network Data Centre for Africa Open Data Policy and Infrastructure: Case of Jomo Kenyatta University of Agriculture and Technology (JKUAT), Kenya Transition to Open Access in Africa – developments, challenges and sustainability. Recent Developments in Open Data Policy Development of the interdisciplinary research infrastructure in Ukraine Global Change Data Encyclopaedia for Local Studies in Developing Countries Open Data Empowerment: A weapon to fight the educational barriers imposed by socioeconomic factors in developing countries	Wim Hugo Wafula Muliaro Daisy Selematsela Paul Uhlir Kostiantyn Yefremov Yunqiang ZHU Maria Natalia Norori	35′
3	Open Discussion		45’
4	Review of Actions	Chair	5’

Humans of Data 5

“We’re at this turning point where archivists are working in the area of research data – it’s just so cool to feel like you’re at the cutting edge of something and you can facilitate that conversation. Being an archivist and saying I work with research data can help expand people’s expectations of what archivists do and what we’re interested in. People should consider that archivists are appropriate to data, but archivists should also consider a broader view of what they do. The things we work with can be data. And we need to talk about terminology – we need to find ways of talking that make sense to archivists and also to the research data audience. I love having those conversations across domains. When else would I talk to a physicist or biologist about what they do?”

Humans of Data 4

“So when I was a kid, obviously Star Trek was the thing, because it was our better selves in the 23rd century. Civil rights, women’s rights, all those issues that were happening at that time in the 1960s were simplified in that show. But the thing that got me was the computer. Spock would have this conversation: ‘Computer, what is this thing? What was the global temperature in 1934?’ And there was always an answer. My start with data was looking at how instruments recorded it. As I’ve started to get into managing people, writing code, I’ve realised that we’re the people in someone else’s past. If we don’t get it right, they will suffer. They’ll ask the question, and the computer won’t have an answer. These people are all trying to get to that better 23rd century. It’s slow progress, baby steps. But being able to make sense of the research results that we take now, consolidating that, is really important to me.”

Open Data as a Moving Target: What Does it Take to Allow Reuse?

By Irene Pasquetto

As we all know too well, making all scientific data technically and legally accessible to all researchers is an ambitious task complicated by constantly evolving social and technical barriers. It is fair to say that we are making progresses in this direction. At Scidatacon 2016, we examined several concrete solutions that can facilitate openness of scientific data or, if you prefer, make sure data are FAIR (findable, accessible, interoperable and reusable).

However, it seems that the more we learn about how to make data open, the least we know about how exactly data will be reused by the scientific community, which means by the researchers who generated the data and should have a primary interest in accessing it. Very few empirical studies exist on the extent to which open data are used and reused once deposited in open repositories.

The fulcrum of the problem is that data take many forms, and are produced, managed and used by diverse communities for different purposes. Nevertheless, different stakeholders (publishers, data curators, digital librarians, funders, scientists etc.) bear competing points of view on the kind of policies, values, and infrastructural solutions necessary to make data open. During a session moderated by Christine Borgman (UCLA) and titled “How, When, and Why are Data Open? Competing Perspectives on Open Data in Science”, Matthew Mayernik (National Center for Atmospheric Research), Parsons Mark (National Snow and Ice Data Center) and Irene Pasquetto (UCLA – Center for Knowledge Infrastructures) presented on some of those challenges that make the use and reuse of “open data” such a complicated and heterogeneous process.

Mayernik argued that the integration of the Internet into research institutions has changed the kinds of accountabilities that apply to research data. On one hand, open data policies expect researchers to be accountable for creating data and metadata that support data sharing and reuse in a broad sense, in many cases, to any possible digital user in the world. On the other hand, providing accounts of data practices that satisfy every possible user is in most cases impossible.

In his talk, Parsons effectively showed that data access is an ongoing process, not a one-time event. Parsons and his team examined how the data repositories products and their curation have evolved over time in response to environmental events and increasing scientific and public demand over several decades. The products have evolved in conjunction with the needs of a changing and expanding designated user community. In other words, Parsons’ case study shows that it is difficult to predict the users of a data service because new and unexpected audiences (with specific needs) could emerge at any time. Parsons also argued that, for this reason, “data generators” may not be the best individuals to predict future uses of their own data.

Because open data users change over time, it is also necessary to built open repositories that provide data in formats flexible enough to allow different approaches to data analysis and integration, for different audiences. This was the point made by Pasquetto, whose case study is a consortium for data sharing in craniofacial research, with a focus on the subfield of developmental/evolutionary biology that recently adopted genomics approaches to knowledge discovery. Pasquetto found flexible data integration to be a necessary precursor to using and reusing data. “Data integration work” is the most contested and problematic task faced by the community, where data need to be integrated at two or more levels and these levels require extensive collaboration between engineers, biologists, and bioinformaticians.

Borgman also presented a paper on the beneath of Ashley Sands, who recently graduated from the department of Information Studies at UCLA and is now senior program officer at the Institute of Museum and Library Services in Washington DC. This talk examined characteristics of openness in the collection, dissemination, and reuse of data in two astronomy sky survey case studies: the Sloan Digital Sky Survey (SDSS) and the Large Synoptic Survey Telescope (LSST). Discussion included how the SDSS and LSST data, and datasets derived from the projects by end users, become available for reuse. Sands found that the rate at which data are released, the populations to which the data are made open, the length of time data creators plan to make the data available, the scale at which these endeavors take place, and the stages of these two projects all have great impact to the extent in which data and then reused.

Moral of the story: open data is a fast moving target. In order to enable reuse, data repositories better start to run.

Humans of Data 3

“I find it relaxing to work with data. I’m a mathematician by training and much more into applied mathematics, so I find recursive formulas very relaxing and linear algebra is like a fun puzzle, like a crossword. I like problem solving. ‘Big data’ is an excellent field for problem solving. I like finding elegant solutions to complex problems. I approach problem solving slightly off-kilter from others – I would often get weird grades in school, but it also means that if people give me problems they’re struggling with, I could look at it and come up with something different from them. This is my first data science meeting. I’m enjoying the opportunity and being around mathematicians and database people and folks who get excited by data. And I’m pleased that there are other women I can talk to.”

CODATA Blog

News from the CODATA community and from Simon Hodson, CODATA Executive Director