The next generation of data scientists

This post was written by Sarah Jones. Sarah coordinates work on the DCC’s Data Management Planning tool – DMPonline – and undertakes research on data policy and data management planning. She has written several articles and book chapters on these topics, and co-edited Delivering Research Data Management Services: fundamentals of good practice.

Sarah is involved in several European e-infrastructure, coordination and open science projects including, FOSTER+EOSCOpenAIRE and EUDAT. She is also rapporteur on the European Commission’s FAIR Data Expert Group. Her work in a European context focuses primarily on training and data management planning to facilitate open science and compliance with Horizon 2020 requirements. 

The last two weeks have seen the first CODATA/RDA Research Data Science school in South America. We started the initiative in 2015, and after developing a curriculum to offer broad-based, introductory data skills to Early Career Researchers with a specific focus on those from Lower and Middle Income Countries, we ran our inaugural School at the International Centre for Theoretical Physics (ICTP) in Trieste, Italy in August 2016.

From the start the Schools were a huge success, receiving hundreds of applications from researchers in a diverse range of countries and disciplines. We’ve continually iterated on the curriculum based on student feedback and developments in the field. The event in São Paulo was an important first step to branch out to regional schools and develop local hubs of expertise. We hope the School in South America will become an annual event and will shortly be inviting applications to host one in Africa in Autumn 2018 as we’ve had many requests from there as well.

For my own part, the School has become one of the regular events I look forward to the most. The students are so enthused and keen to take the learning back to their institutions and colleagues that you really feel you are making an impact. Kevin and I have amended the Research Data Management curriculum over time, adding elements on FAIR data and new RDM services. We’re also in discussion with Gail Clements who runs Author Carpentry and Louise Bezuidenhout who teaches on open science and ethics, about how we can combine these three topics into one joint module for Trieste 2018.

In São Paulo we were joined by Steve Diggs from Scripps who put together an excellent data reuse lab. Students had to form mixed-skill teams and then review research papers for links to the underlying data. Donning their investigative deerstalkers, they then obtained the data and reproduced results. It was fantastic to see the determination and ingenuity displayed across the teams. They brought such creativity and inventiveness to the various pitfalls encountered, and the exercise drove home the message of why it’s so important to make your data FAIR.

It may surprise you all to learn that these Schools are an entirely volunteer effort. Hugh, Rob, Ciira and I give up our time to plan, coordinate and teach on the Schools, and this would not happen without the backing of our institutions. The host organisations (ICTP in Trieste and UNESP in São Paulo) invest a great deal of time and finances to make the Schools run. They provide the venue, accommodation and catering, cover student travel and administer all the visas, and provide the most excellent local support when we’re in town running the Schools. On top of that we receive a lot of small donations from too many organisations to mention. This covers the speaker travel and supports the helpers.

This year we particularly want to thank Springer Nature and Wellcome Trust, whose support enabled the helpers participation and allowed us to run a weekend session to let this new cohort of students know how they can get involved. Oscar, Sara, Marcella and Silvia (pictured below) have all participated in previous Schools and are now bringing back their expertise to help others. At the weekend session, Sara explained to a packed room how different it is being a helper and how much it enriches your learning. Students approach the tasks differently so you’re troubleshooting a really wide range of problems and learning so much more about the technology by doing so.

The next two priorities are to increase the regions in which the Schools take place, and to move them on to a more sustainable footing which is not so reliant on volunteer effort and sponsorship. In 2018 we hope to run 3 Schools. One will take place in Trieste on 6-17 August 2018, and we anticipate others in Africa in Sept/Oct and Brazil in December. As part of the CODATA Task Force we’ll be reaching out to funders to seek support for a central office, and exploring business models to sustain the Schools. One idea is to run Schools in the USA and Europe with a delegate fee that is reinvested in supporting the Schools for LMIC. We hope to trial this in 2019.

With this being the Season of Goodwill and people looking for opportunities to give back to community, I would encourage you to think about what you could do for the Schools. Are you in a position to help us to coordinate them, to teach, to host events, to sponsor or help us develop a robust business model? There’s a huge demand for the training and we need lots of different inputs to make it scale.

CODATA Working Group Co-Chairs

  • Sarah Jones, Digital Curation Centre, Scotland
  • Ciira Maina, Dedan Kimathi University of Technology, Kenya
  • Rob Quick, Indiana university, USA
  • Hugh Shanahan, Royal Holloway University of London, England

This blog post first appeared on the DCC website

Geo4SDGs – a conference connecting geospatial with development community

Dear colleagues,
Adopted in 2015, Agenda 2030 outlines 17 diverse set of goals that touch several domains, yet are systemically interconnected and entwined with each other. Any shift or change in one segment can and will impact several others. Which is why it’s crucial for multiple stakeholder communities to work collectively to be successful. What is required is coordination, communication, and cohesiveness of policies and program implementation that can be built upon reliable systems, tools and technologies. 
Geospatial technologies with its ever-growing reach and impact offers an ideal solution for creating infrastructure and system that can help integrate datasets from a diverse range of sources to provide smart analytics for decision making and effective communication. Acknowledging and aligning potential value of geospatial technology for Sustainable Development Goals, Geospatial World Forum shall host a two-day international conferenceGeo4SDGs: Addressing Agenda 2030 on January 18-19, 2018 at HICC, Hyderabad, India. The conference is supported by United Nations Global Geospatial Information Management Expert Group, Group on Earth Observations, Google Earth Outreach, Radiant.Earth, ICIMOD, Bill and Melinda Gates Foundation, Dutch Kadaster and World Resources Institute. The conference themed, ‘Geo-knowledge conduit to SDGs success’, aims to provide a platform to bring collaboration and liaison among government agencies, commercial sectors, multilateral and international development organization and the civil society through initiating detailed discussions. The conference aims to bridge the gap between the geospatial community, the IT community, the policy makers and the pursuers of the SDGs.
We are happy to share the panelists for Geo4SDGs. The distinguished plenary speakers and panelists will be sharing their insightful stories and experiences on topics that will highlight the applications of geospatial data, tools and technologies for monitoring and evaluating the SDGs. If you are interested to attend the conference, please contact Ms. Megha Datta at or call her at 9811049987.
We look forward to hearing from you.
Warm regards,
Ms. Megha Datta
Director – Sales and Business Development
Market Intelligence and Policy Advocacy Division
Geospatial Media and Communications Pvt. Ltd.
A-145, Sector – 63, NOIDA (U.P.) India – 201301
Tel: +91 120 4612500
Fax: +91 120 4612555
Mobile: +91 9811049987
Skype: megha.datta



Dear All,       


I take this opportunity to introduce CEPT University, Ahmedabad which is a leading institution offering Under-Graduate, Post-Graduate and Doctoral programme in the areas of natural and built environment related disciplines in Architecture, Planning, Technology, Design, and Management. The University engages in academic, research, consultancy and outreach activities. It organizes national and international events periodically for the knowledge exchange, dissemination as well as networking for the young students.

This December, CEPT University is organizing a ‘National Symposium on Industry Academia Collaboration for Geospatial Technologies – 2017’ during 15-16 December 2017. The details on the events are available https://geomaticsindia-cept.orgThe symposium will focus on setting out innovative trends and technology driven attitude through the ideas shared by industry and academia experts. Focus will be made in understanding the concepts of Geospatial technology and how it can be useful in bringing the world together through participation of both academia and industry.


Experts from academia and industry will join hands and discuss about the emerging trends in geospatial those goals industry and the role of academia in reaching. Vision on GIS industries and trends, deciding roadmaps in GIS and have better visibility in working together for the benefit of both academia and industry. A special session is organized for the industry to discuss with the students and the academia on their requirements and the respective changes needed in the academic system.

This event is open for geospatial industry, academia, government and NGO. It is expected to provide an excellent platform for the discussion on need, limitations, challenges and opportunities for establishing the collaboration between the geospatial industry and the academia, the scientists, and the government. The University expects large number of participants from geospatial industry, academia and government. Two days long discussion will open the opportunities to the geospatial industries, through interaction, sharing of innovations and showcasing the products and respective upgrades to the core group of academia, students, government officials, scientists and other interested groups.

This symposium is intended to open the gates for the collaboration and strengthening the ecosystem of Geospatial Technology

I am very glad to personally invite you to participate in 2-Day event. Please block the dates for these events. We shall be happy to welcome you in Ahmedabad

May I request you to kindly circulate this information to your colleagues.
Thanking you
With best regards.

Prof. Anjana Vyas, Ph D
Executive Director,
Centre for Advanced Geomatics, CRDF

Humans of Data 23

“I’m excited that people are now starting to think about data sharing. For the last few years it’s been me, as the institutional data manager, going to people and saying, ‘You should make your data available!’  Now people are getting in touch and saying they want to do it, because they’re recognising they can get more stuff published that they can get recognition for.

It’s also good that we’re getting more than just the raw or aggregated data – we’re also getting the survey tools, the Stata code and the files for the processing scripts for how the data is analysed.  It’s exploding out into all the different stages of research.  If you’re thinking about reproducibility of research, you still only see tiny snapshots of that.  I’d like to do more about that: my frustration is that we don’t have software to document all stages of the research process.

A lot of those research outputs are useful but also ephemeral.  If you wanted to reapply a questionnaire, you’d have to do an update of it 2 or 3 years down the line.  Research approaches change, the language changes and so on.  But you could actually go back and do a comparison about how interviewing has changed over a specific time period – as long as we start managing those research outputs too, alongside the data and publications.”

Humans of Data 22

“In my previous life as an academic, I always liked interdisciplinary work: to come at things from a slightly sideways perspective. But in this area, I get to encounter more than most people do – collections, ideas, researchers, people, stories … I get to discover everything from every different area of knowledge, from lots of different perspectives.  The data itself is obviously really interesting but it’s what goes into the creation of that data, and what people then do with that data – that’s what’s really fascinating to me.

When people ask me, ‘What do you do?’, I’m still not sure how best to describe it.  Whenever someone asks, I give a different answer, but it doesn’t actually capture what the day-to-day work is about, which is the exchange of social and cultural knowledge.  I think that’s the most appealing thing to me.  There’s always something new to find out about, and this central thing that we call ‘data’ is a conduit into discovery of all kinds of stories and narratives.  It’s a window into lots of different worlds.”

Humans of Data 21

I’m not a data scientist but I know how to read and fiddle with code. This is what drives me – I want to understand and know something practically, not just by reading about it but by getting first-hand experience in collecting data, doing things with it, manipulation. I enjoy this and find it valuable. I do theory about data practice, so I’m interested in asking what data does to knowledge practices, but I’m looking at it as a philosopher rather than anything else. I’m interested in how data can be used to tell stories, but want to take this one step further. How do we use data to make arguments? I’m interested in how we can move to a critical way of looking at argumentation – how we can use data as evidence, to convince, to tell stories. I’m asking what is ‘good enough’ knowledge, what is ‘responsible’ knowledge, what is ‘valuable’ knowledge? What are the ethical considerations about data when we use it to make decisions?

Humans of Data 20

“Still, I’m inspired by the fact that the field is cross-disciplinary.  To be able to talk about digital preservation in a holistic way you need data producers and data consumers including people from information sciences, library scientists and researchers.  With every domain we need to understand a whole new idea of how data is produced and consumed and the use cases for the value of data.  It never gets boring.  There will always be work.  And if I have a question about a file format or metadata problem I can ask colleagues in New Zealand or the States or Scotland or the Netherlands and they know what I’m talking about.  I love that.  To me it’s like a cool kids’ domain!”

Summary of Linked Open Data for Global Disaster Risk Research activity involving Dr Bapon Fakhruddin and Professor Virginia Murray

Dr Bapon Fakhruddin

The fourth Pacific Meteorological Council and second Pacific Meteorological Ministers Meeting (PMMM) was held in Honiara, Solomon Islands, 14-17 August, 2017.

Dr Bapon Fakhruddin’s presentation on end-to-end impact based multi-hazard early warning systems and disaster loss data collection for risk assessment, beginning with community ownership and engagement, was exceptionally well received.  More

Disaster Risk and Resilience Roundtable, 19 June 2017, Wellington, New Zealand

Professor Virginia Murray

The Global Platform disaster loss data working session reinvigorated a high level roundtable followed a seminar on Global experiences on managing disaster risk – rethinking NZ’s policy approach by Elizabeth Longworth (ex UN Office for Disaster Risk Reduction . The roundtable emphasized to strengthen risk governance system of New Zealand. There is a very strong business case to be made for investing in disaster risk reduction. It has been estimated that an annual global investment of USD 6 billion in disaster risk management strategies would generate USD 360 billion worth of benefits in terms of reducing risk. On that basis, New Zealand might expect a return on investment of 60 times for every dollar spent on reducing disaster risk. In terms of creating shared value, investment in disaster risk management has co-benefits of strengthening resilience, competitiveness and sustainability.

The estimates for direct losses are considered to be perhaps 50% under-reported due to the pervasive nature of smaller scale, localised and recurring disasters. It is concerning that, internationally, the mortality and economic losses from extensive disaster risk are trending upwards. For New Zealand and its Pacific Island neighbours, climate change will magnify disaster risk and increase the costs. With the New Zealand economy heavily reliant on the agricultural sector, it is particularly exposed to weather-related events.

In the same way that New Zealand’s approach to social investment requires improved data and analysis, so too does the production of NZ-based risk information and integrated databases. Greater sensitivity as to the causes and consequences of disaster risk could strengthen accountabilities as to disaster impacts.

A modern-day approach to risk governance also requires greater inclusiveness and transparency. New Zealand needs to pursue an ‘NZ-Inc’ approach. The nature of disaster risk necessitates a whole-of-government response. Dr Bapon Fakhruddin attended the roundtable as an expert.

Workshop on developing a disaster loss database for New Zealand, 28 September 2017

MCDEM will be holding an initial all day workshop on 28 September to discuss all elements of the Loss Database Project. 5th Global Platform for Disaster Risk Reduction (DRR) was held in Mexico between 22-26 May 2017. The Platform was hosted by the United Nations Office for Disaster Risk Reduction (UNISDR) and the Mexican government to support the continual progress assessments of the Sendai Framework (SFDRR) implementation. The New Zealand delegation was led by Special Envoy for Disaster Risk Management (Philip Gibson, MFAT) accompanied by officials from MFAT (1) and MCDEM (3), plus a wider NZ Inc. delegation of 20 which comprised representation from academia, NGOs, local government and private sector providers.

Following the Platform, a number of key pieces of work are in progress, or need to be considered to give effect to the Framework, put priorities into action and report on the Global Indicators. Of note, these are:

  • Finalising the National Disaster Resilience Strategy
  • Developing the concept for a National Platform for DRR
  • Developing a National Disaster Loss Database and routine disaster loss reporting
  • Project to develop better methods of pricing risk and forecasting losses

The first project MCDEM wish to seek your engagement on is the Loss Database. This is something given consideration to in the past, but is now critical due to its significance to future Sendai reporting. Unlike previous reporting on the Hyogo Framework for Action that focussed on qualitative data on inputs and outputs, Sendai reporting is focussed on outcomes, i.e. losses from disasters, and whether seeing a downwards trend.

ISCRAM Asia Pacific 2018 Conference, Wellington, New Zealand

Dr Bapon Fakhruddin and Professor Virginia Murray will be chair a session on disaster data Issues for situational awareness in the ISCRAM Asia Pacific Conference in late 2018 (

Humans of Data 19

“Digital preservation is a perfect field because it unites two things I’m passionate about: humanities and IT.  I can work on a framework to keep the data for future generations.  It’s always been important to do that whether the data is analogue or not.  Data presents evidence, evidence that’s subject to story telling and interpretation.  It opens up unlimited possibilities.  If you want to understand how a community ticked at a certain time, literature gives you a representation of the time, of what moved people.  Data that we create today can do the same thing.

Data can be literature, poetry, art or factual experimentation.  It’s not just an output of research; it’s an output of creativity and of our life today.  Sometimes we forget that.  
But we should spend more time talking about what works and what doesn’t work.  We need to not always invent new models, but apply a model and see what happens – to use models and tools to curate and treat our data, and then it’s very important to look at these tools critically.  And to improve them. There’s a lot of great output that has come out of projects but does anyone use it?  There’s a gap in implementation.  And funding’s becoming scarcer, so we need to find more effective ways to make tools sustainable and useable for the user communities.  It’s frustrating.”


ENERGIC-OD: International co-operation to promote FAIR GIS Open Data and the growth of European SMEs

Giuseppe Maio

This post was written by Giuseppe Maio and Jedrzej Czarnota. Giuseppe is a Research Assistant working on innovation at Trilateral Research. You can contact him at . Giuseppe’s twitter handle is @pepmaio

Jedrzej is a Research Analyst at Trilateral Research. He specialises in innovation management and technology development. You can contact Jedrzej at, and his Twitter is @jedczar.

The value of open data business is increasing at a very fast pace. The open data market is projected to be worth over 75 billion in 2020. Yet, accessing this expanding market is not easy. Open data sources are difficult to find, not interoperable and  hardly reusable, as argued by a recent Open Knowledge Foundation’s report.

Jedrzej Czarnota

ENERGIC-OD, a European Commission project, aims precisely to facilitate access to the open data market in the Geographic Information System (GIS) sector.  The project built a pan-European Virtual Hub (pEVH) simplifying the access to and the use of GIS open data in Europe. Readers can view and utilise the pEVH here.  pEVH brokers together an infinite number of geo-spatial open data sources, harmonising them, rendering them accessible through a single API and ready to be reused for various purposes. pEVH-brokered data is available under freemium licence:  data is free to use and users can pay for some extra features of the pEVH. The freemium model guarantees the promotion of knowledge exchange, the extraction of value from such an exchange and from the services provided by the actors involved.

ENERGIC-OD functions as a data facilitator by improving the quality of the open data available in the GIS sector: the pEVH was designed to ensure that data is aligned with FAIR principles.  These principles advocate that open data should be easy to Find, Accessible, Interoperable and freely Reusable.  pEVH-brokered data is FAIR as the single website where data is stored allows GIS OD sources to be much more findable than before; the single API adopted by ENERGIC-OD makes data usable and interoperable; finally, the freemium model guarantees the re-usability of the data.

To demonstrate the viability of the pEVH, ENERGIC-OD consortium developed 10 applications based on VH-brokered data. These applications range from an app promoting communication between citizens and land consolidation authorities, to a coastline monitoring system that allows people to participate in the scientific observation of coastlines.


ENERGIC-OD is committed to enhance innovativeness among SMEs and incentivise local economic development across Europe. Such objectives appear achievable for three reasons.

Firstly, the FAIR principles characterising pEVH-brokered data facilitated SMEs’ ability to utilise GIS data sources, as ENERGIC-OD lowers entry barriers, preventing the usage of such data.

Secondly, the main features of GIS render this branch of IT extremely suitable for business (Azaz 2011). These features are: 1) spatial imaging, namely GIS’s ability to convey information with a spatial dimension; 2) database management: GIS’s capability of storing, manipulating and providing data; 3) decision modelling, or GIS’s potential to provide intelligence supporting decision making; 4) designing and planning, namely GIS’s potential as a design tool (Azaz 2011). Digital mapping, marketing, transportation and logistics, design and engineering, etc. are some of the sectors which have successfully utilised GIS for business. GIS’s potential can be further exploited coupling GIS systems with modelling tools, the so called “intelligent GIS” (Birkin et al 1995). The retail sector has already utilised intelligent GIS integrating shops’ data and spatial pattern data over time to design spatial interaction models and forecast maintenance costs as well as revenue streams (Altaweel 2016). An example of ENERGIC-OD intelligent GIS app is Natural hazard assessment for Agriculture application. Using satellite imagery, this app delivers predictions of the yield reduction in specific crops based on statistical models, considering factors such as draught, humidity, frost, etc.

Thirdly, small and medium enterprises are the greatest beneficiaries of the open data movement, as they are guaranteed free access to data they would not normally have access to and they are more likely to take advantage of open data and become drivers of innovation (Verhulst and Caplan 2015). SMEs constitute the backbone of the European economy and ENERGIC-OD thus functions as a facilitator for these businesses, enabling them, through the put in practice of FAIR principles, to tap more easily into the GIS open data market.

An initial market research conducted by Trilateral Research, a technology consultancy member of the ENERGIC- OD consortium, confirms SMEs’ high interest in the pEVH. These enterprises will, in the next years, drive innovation and economic growth across Europe.  ENERGIC-OD thus represents an example of international cooperation to promote FAIR GIS open data and the growth and development of European SMEs.


Altaweel, M. (2016). GIS and Small Business Planning ~ GIS Lounge. [online] GIS Lounge. Available at:  [Accessed 11 Sep. 2017].

Azaz, L. (2011). The use of Geographic Information System (GIS) in Business. International Conference on Humanities, Geography and Economics, pp.299-303.

Birkin, M., Clarke, G. and Clarke, M. (1995). GIS for Business and Service Planning. [online] Available at:  [Accessed 11 Sep. 2017].

Verhulst, S. and Caplan, R. (2015). Open Data.A Twenty-First-Century Asset for Small and Medium-Sized Enterprises. [online] Available at:  [Accessed 11 Sep. 2017].