Report to ICTP, Trieste CODATA-RDA Research Data Science Summer School (1st – 12th August, 2016)

This post was written by Elias P.M. Mwakilama, Serving as the deputy head and Coordinator of eliasResearch, Seminar and Consultancies in the Mathematical Sciences Department at University of Malawi-Chancellor College, Elias Mwakilama is a young computational and applied mathematician in the field of operations research. Elias recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy – his participation was kindly supported by ACU.


This report seeks to highlight key beneficial outcomes and recommendations based on the international summer on CODATA-RDA research data science school that I attended at ICTP, Trieste-Italy from 1st to 12th August, 2016 with financial support on travelling and lodging from the Association of Commonwealth Universities (ACU).

First and foremost, I must sincerely thank the school organizing committee for providing a positive answer to my application and for taking such an initiative to seek funding opportunities from ACU. I also thank ACU for providing such funding. I do not take that for granted.

Overall, the summer school was well organized as it brought together early career leading scientists across the world to discuss scientific ideas on importance of data sharing in scientific research from computational and applied statistics perspective in enhancing the scientific and technological capacity of developing countries.

Considering the fact that there exist issues of ethics and confidentiality in data handling, we need to seek means of protecting such data and promote spirit of sharing and publishing among us research scientists so as to cut down costs of replication or duplication of data collection and analysis processes. As such, it is relevant to seek for the right platforms for ensuring quality data handling and sharing in form of well-organized schools such as ICTP CODATA, networking and collaborations. Across the globe, mainly in Africa, financial support for early leading career scientist to attend to or participate in such research platforms and events remains a barrier. However, provision of research funds from scientific institutions such as ICTP-Trieste campus and The Association of Commonwealth Universities assists such financially challenged young leading scientists who consider internationally and locally organized research schools as a platform of the great learning and development of their opportunities. It is through support from institutions such ACU that I had access to the CODATA-RDA research school in my field and the chance to meet influential people who have shown interest to collaborate with and share data for scientific research and policy analysis.

By working closely with leading academics and invited guests at such school, I was exposed to a number of data programming skills which are of genuine and practical importance to my field of computational and applied mathematics in operations research. I have already begun using such skills in my academic research and also when teaching undergraduate students in our institution, Chancellor College. For instance, programming languages such as R-studio and Open source languages have aided me in successfully handling my fourth undergraduate course in Mathematics research.

Besides, being a pioneer of a newly established Mathematics and Statistics research group, Fibonacci Research Group (FRG) in our department (Mathematical Sciences), the summer school has assisted both me and the group in establishing research links with other academic applied statisticians which can develop far beyond the problem posed by the industry in Malawi to our existing research group. Leadership skills learnt from the summer school by observing the way the moderators coordinated their work on the problem and skills in presenting research material and in scientific communication would be used in coordinating research projects in the research group for the entire group benefits.

The summer school has also provided me with an opportunity to work on problems of genuine practical importance and do good computational mathematics in the process of designing new research areas that can be opened up leading to publications and new research collaborations. Skills acquired from the school would also provide an opportunity to our research group by applying knowledge and skills to significant practical problems and then stimulate in industry the awareness of the power of data sharing in statistical modeling and scientific computing.

With these many remarks, I therefore encourage your office to continue supporting young African early scientists through provision of similar financial support so as to develop our continent, in particular developing countries such as Malawi. I also wish to request for an idea of establishing more data sharing and analysis scientific centres such as ICTP in Africa where more early career research scientists such as myself could be able to get similar trainings at low costs and easier. I attach a certificate of participation that I got from the ICTP-CODATA RDA School of research science as an evidence of my full attendance and participation at the school.

PASTD – Task group report of SciDataCon2016 Activities

  1. Continuation of Task Group approved by CODATA General Assembly
    On behalf CODATA PASTD, Dr. Xiang ZHOU attended the CODATA General Assembly in pastdDenver on 11th September 2016. Dr. Zhou gave a 5-minute presentation on the CODATA PASTD Task Group’s activities over the past two years. He also presented the updated objectives, action plan, and expected cooperation with multiple stakeholders in developing and developed countries. The updated CODATA PASTD action plan is a response to the ‘Open Data in a Big Data World’ International Accord issued by ICSU, IAP, ISSC and TWAS. In next two years, CODATA PASTD activities will pay more attention to open data policies, best practices and capacity building in low and middle income countries.CODATA PASTD was on of eight Task Groups approved by the General Assembly (from 15 proposals). The renewal of PASTD as a CODATA Task Group was in line with the CODATA Executive Committee’s recommendations and reflects its wide partnerships, past achievements and high priority activities which are line with the CODATA Strategic Plan.
  2. Key Issues and Action Plan for the Preservation and Access of Research Data in Developing Countries – Panel Discussion Session in SciDataCon 2016 in Denver
    At SciDataCon 2016, held as part of International Data Week in Denver, 11-17 Septemberpastd-workshop 2016, the CODATA PASTD Task Group organized a panel discussion to address key Issues and an action plan for the preservation and access of research data in developing countries.The group was held in the afternoon of 13 September and chaired by Dr. Xiang Zhou, the co-chair of the PASTD Task Group. Dr. Zhou made welcome speech and gave a brief introduction about the CODATA PASTD Task Group and practices of preservation of and access to research data in developing countries. The session provided a forum for participants from many disciplines to exchange ideas about key Issues on data preservation, access and sharing of research data in developing countries. The session consisted of seven invited talks covering issues of policy, technologies, capacity building and best practices, followed by an open discussion (Table.1).
  • Wim Hugo, from South African Environmental Observation Network (SAEON), presented the concept of a Network Data Centre for Africa.
  • Joseph Muliaro Wafula, from ICT Centre of Excellence and Open Data (iCEOD), Jomo Kenyatta University of Agriculture and Technology (JKUAT), shared the experiences of developing an open data policy and infrastructure taking JKUAT as a case study.
  • Daisy Selematsela from the Knowledge Management Corporate of the National Research Foundation, South Africa, presented developments and transition of open data access in Africa, and challenges and sustainability of Open Access.
  • Paul Uhlir, as a consultant in research data policy and management, and Scholar of the National Academy of Sciences, described recent developments in open data policies and discussed data sharing and data management principles in developing countries.
  • Kostiantyn Yefremov, Director of the World Data Center for Geoinformatics and Sustainable Development, National Technical University of Ukraine, described the development of the interdisciplinary research infrastructure in Ukraine.
  • Yuanqiang ZHU, Professor of the Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences (IGSNRR/CAS), discussed publishing and sharing research data for sciences and sustainability in developing countries, and the practice of the Global Change Research Data Publishing and Repository.
  • Maria Natalia Norori from Universidad Latina de Costa Rica, analyzed the benefits, development and challenges of Open Data Empowerment. There is a consensus that Open data is an important weapon to fight the educational barriers imposed by socioeconomic factors in developing countries.
  • Furthermore, participants discussed challenges, policies and actions for implementation of “Nairobi Data Sharing Principles” in developing countries, especially in LMICs. Implementation guidelines for data sharing principles can help promote the infrastructures and the development of data sharing capacity and best practices in future years.

The following tasks are given priority on implementation of task group objectives:

  • Increase awareness and consensus on Nairobi Guidelines / Open Data Principles in Developing Countries;
  • Workshop and training for francophone countries, proposed in Madagascar in 2017;
  • Special issue to the Journal of Data Science: best practices and show cases of implementing Open Data in a Big Data World in developing countries;
  • Enhancement of the online services of the global change research data publishing and repository

Table 1.Agenda of PASTD Session in SciDataCon

Key Issues and Action Plan for the Preservation and Access of Research Data in Developing Countries

13 September 2016
Chaired by: Xiang ZHOU

Topic Who
1 Welcome and Introductions Chair 5’
2 Invited Talks:

Wim Hugo

Wafula Muliaro


Daisy Selematsela

Paul Uhlir

Kostiantyn Yefremov

Yunqiang ZHU

Maria Natalia Norori

3 Open Discussion 45’
4 Review of Actions Chair 5’

Humans of Data 5

img_3780-hod05“We’re at this turning point where archivists are working in the area of research data – it’s just so cool to feel like you’re at the cutting edge of something and you can facilitate that conversation.  Being an archivist and saying I work with research data can help expand people’s expectations of what archivists do and what we’re interested in.  People should consider that archivists are appropriate to data, but archivists should also consider a broader view of what they do. The things we work with can be data.  And we need to talk about terminology – we need to find ways of talking that make sense to archivists and also to the research data audience.  I love having those conversations across domains. When else would I talk to a physicist or biologist about what they do?”

Humans of Data 4

img_3758_small1“So when I was a kid, obviously Star Trek was the thing, because it was our better selves in the 23rd century. Civil rights, women’s rights, all those issues that were happening at that time in the 1960s were simplified in that show. But the thing that got me was the computer. Spock would have this conversation: ‘Computer, what is this thing? What was the global temperature in 1934?’ And there was always an answer. My start with data was looking at how instruments recorded it. As I’ve started to get into managing people, writing code, I’ve realised that we’re the people in someone else’s past. If we don’t get it right, they will suffer. They’ll ask the question, and the computer won’t have an answer. These people are all trying to get to that better 23rd century. It’s slow progress, baby steps. But being able to make sense of the research results that we take now, consolidating that, is really important to me.”

Open Data as a Moving Target: What Does it Take to Allow Reuse?

By Irene Pasquetto

As we all know too well, making all scientific data technically and legally accessible to img_20160913_133322all researchers is an ambitious task complicated by constantly evolving social and technical barriers. It is fair to say that we are making progresses in this direction. At Scidatacon 2016, we examined several concrete solutions that can facilitate openness of scientific data or, if you prefer, make sure data are FAIR (findable, accessible, interoperable and reusable).

However, it seems that the more we learn about how to make data open, the least we know about how exactly data will be reused by the scientific community, which means by the researchers who generated the data and should have a primary interest in accessing it. Very few empirical studies exist on the extent to which open data are used and reused once deposited in open repositories.

The fulcrum of the problem is that data take many forms, and are produced, managed and img_20160913_133754used by diverse communities for different purposes. Nevertheless, different stakeholders (publishers, data curators, digital librarians, funders, scientists etc.) bear competing points of view on the kind of policies, values, and infrastructural solutions necessary to make data open. During a session moderated by Christine Borgman (UCLA) and titled “How, When, and Why are Data Open? Competing Perspectives on Open Data in Science”, Matthew Mayernik (National Center for Atmospheric Research), Parsons Mark (National Snow and Ice Data Center) and Irene Pasquetto (UCLA – Center for Knowledge Infrastructures) presented on some of those challenges that make the use and reuse of “open data” such a complicated and heterogeneous process.

Mayernik argued that the integration of the Internet into research institutions has changed the img_20160913_140145kinds of accountabilities that apply to research data. On one hand, open data policies expect researchers to be accountable for creating data and metadata that support data sharing and reuse in a broad sense, in many cases, to any possible digital user in the world. On the other hand, providing accounts of data practices that satisfy every possible user is in most cases impossible.

In his talk, Parsons effectively showed that data access is an ongoing process, not a one-time img_20160913_134549event. Parsons and his team examined how the data repositories products and their curation have evolved over time in response to environmental events and increasing scientific and public demand over several decades. The products have evolved in conjunction with the needs of a changing and expanding designated user community. In other words, Parsons’ case study shows that it is difficult to predict the users of a data service because new and unexpected audiences (with specific needs) could emerge at any time. Parsons also argued that, for this reason, “data generators” may not be the best individuals to predict future uses of their own data.

Because open data users change over time, it is also necessary to built open repositories that provide data in formats flexible enough to allow different approaches to data analysis and integration, for different audiences. This was the point made by Pasquetto, whose case study is a consortium for data sharing in craniofacial research, with a focus on the subfield of developmental/evolutionary biology that recently adopted genomics approaches to knowledge discovery. Pasquetto found flexible data integration to be a necessary precursor to using and reusing data. “Data integration work” is the most contested and problematic task faced by the community, where data need to be integrated at two or more levels and these levels require extensive collaboration between engineers, biologists, and bioinformaticians.

Borgman also presented a paper on the beneath of Ashley Sands, who recently graduated img_20160913_135716from the department of Information Studies at UCLA and is now senior program officer at the Institute of Museum and Library Services in Washington DC. This talk examined characteristics of openness in the collection, dissemination, and reuse of data in two astronomy sky survey case studies: the Sloan Digital Sky Survey (SDSS) and the Large Synoptic Survey Telescope (LSST). Discussion included how the SDSS and LSST data, and datasets derived from the projects by end users, become available for reuse. Sands found that the rate at which data are released, the populations to which the data are made open, the length of time data creators plan to make the data available, the scale at which these endeavors take place, and the stages of these two projects all have great impact to the extent in which data and then reused.

Moral of the story: open data is a fast moving target. In order to enable reuse, data repositories better start to run.

Humans of Data 3

img_3715“I find it relaxing to work with data.  I’m a mathematician by training and much more into applied mathematics, so I find recursive formulas very relaxing and linear algebra is like a fun puzzle, like a crossword.  I like problem solving.  ‘Big data’ is an excellent field for problem solving.  I like finding elegant solutions to complex problems.  I approach problem solving slightly off-kilter from others – I would often get weird grades in school, but it also means that if people give me problems they’re struggling with, I could look at it and come up with something different from them.  This is my first data science meeting.  I’m enjoying the opportunity and being around mathematicians and database people and folks who get excited by data.  And I’m pleased that there are other women I can talk to.”

Humans of Data 2

img_3635-copy“One of the coolest thing is starting out as a student in the research data management field, being early in my career, and then being able to interact with the same people over time. I feel like I’m kind of growing up as an individual. I feel I can say, hey, you guys made an impact on what I do, and now I can give back.”

Humans of Data 1

img_3656_small“I think you need to express yourself the way you feel you should, because what really matters at this conference is that we’re all interested in making data available, accessible and preserving it, and we shouldn’t feel that we have to sacrifice who we are in part or whole, in order to do our work.

I hear far more people who are complimentary about the way I dress than not, so it’s not like it’s problematic. But it shouldn’t matter anyway. We have to just keep being who we are, and the other people will catch up.”

Humans of Data

At events like International Data Week, much discussion has happened around the technical and legislative challenges and opportunities relating to research data. But in many presentations and group meetings, we have repeatedly heard that our human behaviour – our desires, ambitions, fears, traditions and habits – shape how effectively we create, manage, share and reuse research data assets, and how open we are to collaborating on research data infrastructure.  As many speakers have noted, the technical challenges are usually susceptible to scoping and tackling, but the really intricate work is the work of creating social change and new behaviours.

As an artist and a researcher, I’m passionate about digital curation, digital preservation and research data management, and how those skills are useful to everyone in contemporary society to one extent or another.  And I’m also passionate about the way that research data – and visual art – have so much potential to transform our lives, societies and the world around us.  As I’ve continued to attend data-related conferences, I’ve become fascinated with this human element. I also noted that the International Data Week crowd is a welcome mix of nationalities, genders, ages and ethnicities.  It’s critical that our conversations include people unlike ourselves, and there is so much to be gained from getting to know each other better in order to build the kinds of relationships that can help us make progress across communities, nationalities and disciplines.

To that end, I launched a project called “Humans of Data’. It’s a really simple idea – basically the same as the ‘Humans of New York’ project online, where there is a photo and a quote from each (unnamed) person.  I hope this helps to get a more personal, human conversation going amongst the amazing people I meet at data conferences all over the world, connecting with their lives as individuals and having them say something about what they’re passionate about when it comes to data-related issues.

I’ll be posting the ‘Humans of Data’ here on the CODATA blog as each photo and quote becomes available. If you’d like to view them as a group, please click the ‘Humans of Data’ category to group these posts together. If you’d like to participate, please email me at laura.molloy AT, or contact me via Twitter @LM_HATII.

Looking forward to our collaboration!

Scidatacon: Opening keynotes

It was a pleasure to start off the first full day of SciDataCon with a keynote from Elaine M Faustman, Professor and Director at the Institute for Risk Analysis and Risk Communication, University of Washington and member of the ICSU World Data System Scientific Committee.  Professor Faustman’s keynote talk, ‘Challenges and Opportunities with Citizen Science:  How a decade of opening1experiences have shaped our forward paths’, introduced a welcome early focus on the importance of rigorous ethical approaches to ‘citizen science’ research projects. Looking back to the early roots of the knowledge practices we now call ‘science’, Faustman reminded us that of course the roots of these practices can be found in the work of European gentleman scientists and their cabinets of curiosities.   She also situated contemporary citizen science practice in the US legislatory framework of the US citizen’s right to know, work which has been underpinned by standards and acts since the 1940s onwards.

Reflecting over a decade of citizen science practice in the environment and public health domains, Faustman provided examples of projects where citizens are not only research subjects but are centrally influential in the work, to the point where they express ownership of the project alongside the university team.  Discussion focused on the importance of the abilty of research participants to influence the direction and scope of the research project, to provide feedback on its progress, and to have access to the data accrued in order to be able – in case of public health projects at least – to use it to guide their ownopening4 decision-making.  The message of deep ethical engagement and building respectful relationships with participants set the scene for a day in which ethical issues reverberated.

The second keynote was by Simon Cox, Research Scientist, Environmental Informatics, CSIRO Land and Water, Clayton, Melbourne, Australia.  In his talk, ‘What does that symbol mean? – controlled vocabularies and vocabulary services’, Cox raised a very pragmatic point about the widespread problem of non-systematic use of symbols – and keywords – in data.  He demonstrated that we assume symbols and keywords have some sort of shared meaning, at least in a given community, but that the reality is much less systematic. Symbols and abbreviations with no widely used consistent meaning are often used by researchers when creating data. Populaopening6r terms describing volume can mean entirely different things in different countries.  And even symbols of terms describing a widely understood measurement, such as the metre, can be problematic link to a common source: the International Bureau of Weights and Measures provides a definition, which can be found via a given URI. But the fact that this URI has changed regularly from year to year disrupts any expectation of a stable, enduring location for this definition.

Cox suggested a couple of actions to mitigate this situation. Firstly, a new CODATA task group on coordinating data standards will take this work forward. Secondly, the Global Agricultural Concept Scheme – GACS – is the result of three defining sources from agricultural research banding together to deduplicate their respective vocabularies and make them interoperable for agricultural researchers. Cox noted that the technical job is not large but that – in confluence with Faustman’s earlier message – the really big job is achieving the buy-in from the community in question.

So the pesky human dimension appears right at the start of International Data Week!  More information on the keynotes is at

Laura Molloy is a doctoral researcher at the Oxford Internet Institute and the Ruskin School of Art, University of Oxford. She is on Twitter at @LM_HATII.