Author Archives: codata_blog

Report to ICTP, Trieste CODATA-RDA Research Data Science Summer School (10th – 21st July, 2017)

This post was written by Shaily Gandhi, who is currently pursuing a PhD in Geomatics from CEPT University, India. Shaily recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy – her participation was kindly supported by ICTP and Nature Publishing, via CODATA.

The CODATA-RDA School of Research Data Science was a great opportunity for me to work with around 45 students from 29 countries (mostly from lower and middle income countries) and from varied educational backgrounds. Such summer schools or short courses can be the best platforms for learning innovative ways of teaching as well and understanding the work done by different people in the same area. The summer school introduced me to various aspects of data science and intensive hands on training: it has stimulated in me the confidence to start working with concepts which I had just read in books. Now I will be able to implement machine learning and artificial neural networks in my PhD study in Geomatics for developing predictive models.

The school uses the Software Carpentry / Data Carpentry approach of having the students provide daily feedback on pink or green stickers (which signify XXXX). This was a factor which made each us feel that our opinions count. I am very thankful to the organizers who have been on their toes and have been working long hours to make the summer school run smoothly. While working closely with leading academics in the field of data science, it was one of the most wonderful experience for me which not only taught me but also it helped in improving my teaching skills. I have observed many small things in their teaching which I would like to implement in the coming semester’s teaching.

Practical teaching and the use of sticky (this picture was taken during the git session summer school 2017 Trieste)

One of the things which caught my eyes on very first day was the way of using the pink and green sticks for indicating if you are good with the practical or if you need help. I will definitely use this in my teaching because teaching practicals becomes very difficult to handle with a large class and if everyone is waving or calling it makes the environment very noisy.

Women in Research Data Science

Apart from technical learning there was a wonderful experience of cultural exchange. One of the most interesting topics which I discussed with Gail Clement from the California Institute of Technology (who introduced us to Author Carpentry) was the loss of academic identity that can be experienced by women who change their name after getting married (and in some countries this change of name is obligatory). She explained that according to the research men’s research works are more cited then women’s: there are many reasons for this and the loss of identify can contribute as computer search mechanisms and bibliographic tools do not necessarily link the works of women prior to and subsequent to a name change. This is one of the important reasons for a recognised and standardsised researcher ID system: for women who have changed their names, having an ORCiD account will help will keep all your academic work associated with on single researcher ID number. Gail also suggested that it would be better if female researchers could retain both the last names which could “help you built your identity and reputation in the professional world”. Many more interesting discussions regarding the ignorance of credit for work were also brought up. In few institutions are the people doing data analysis included as co-authors to the publication: Gail suggested that a standard criteria should be developed and implemented, such that all contributors (including data analysts and data stewards) are credited and the credit for your contribution stays with you.

Working with Irma and Oscar on some super cool projects (from left to right)

I had a great learning experience by working with people from different countries in groups. Throughout the school, we were working in different groups with different people which gave us lot of exposure to understand the varied situation of data science in different countries. We worked on a project which allowed us to make work on the same file using Git and in the second project we coded the neural network model in python.

The Bring Your Own Data session offered good suggestions regarding my problems with data and the confusions which had been addressed by other students in the summer school working in the same area. I learned a lot about statistical analysis from other students, including Felix Anyiam (Data Analyst, University of Port Harcourt Teaching Hospital (UPTH)) and Ola Karra (Lecturer, Department of Statistics, University of Khartoum).

Friends at help with Statistics Laba, Felix and Ola (from left to right)

This summer school gave us first-hand experience on many languages and command line interfaces: topics included DOS, R, Shell, Github, visualisation of data in most beautiful ways, machine learning, artificial neural networks other machine learning systems and recommender systems.

Working with Github was an excellent experience. I had been using google drives to work on shared presentations but Git looks pretty cool and would like to use it for my future work to share data and work in a shared environment.

It was great working on the research computation infrastructure with all the participants working on different systems and learning how to submit the job and get the job done using external resources. We were taught how to get access to super computers from different geographical locations: this enables researchers to keep going as it allows you to work from any part of the world. Resources to run the processes can be allotted from different locations.
Finally, we also got a good insight into research data management, referencing systems and wonderful tips for publishing and licensing work.

Friends of Data Science

Map of Student participants:

 

I am very thankful to ICTP for accepting my application and supporting my stay in Trieste. I am very grateful to Nature Publication, via CODATA for funding my travel which gave me an opportunity to attend this summer school on big data Science.

Humans of Data 17

“Brené Brown, the social scientist, said that stories are data with a soul.  I think about that a lot in the work I do.  I’m passionate about it.  When I meet the most engaging researchers, they’re good storytellers. Data are ways to connect with stories – data are the underlying content that researchers are sharing through their stories. I’m keen on preserving those stories, sharing those stories, now and in the future.

Particularly now, we’re in an unfortunate situation in the United States where things we had taken for granted – trust and integrity of information – are being questioned.  And we’re seeing such an emerging problem with tribalism, where people in their bubbles only talk to each other.

Data are a way we can span between different communities, different tribes, different people. We do that already in the research space, I think, but I hope that by continuing our work in data, we can help to deal with this tribalism issue.”

Thifhelimbilu Mulabisana: My trip to Moscow, Russia for the School for Young Scientists “Methods of Comprehensive Assessment of Seismic Hazard”

Thifhelimbilu Mulabisana is a Junior Scientist in the Geophysics Division of the Council for Geoscience in South Africa. Her day-to-day work involves the recording, processing and analysis of seismological data. The organization manages a network of over 50 seismic stations around the country and these are continuously streaming data into her office for processing. Thifhelimbilu attended the CODATA International Training Workshop in Big Data for Science in July 2016.  And in July 2017 she was able to follow this by attending the School for Young Scientists “Methods of Comprehensive Assessment of Seismic Hazard”, organised by the CODATA member organisation for Russia, the Geophysical Centre of the Russian Academy of Sciences. This is the second of two blog posts in which you can read about the experiences of one young researcher from South Africa in training activities that took her from Beijing to Moscow and back.

Thifhelimbilu (right) with friends in Moscow (Xia from China, currently doing a PhD at the State University of Moscow (left), Nguyen from Institute of Geophysics, Vietnam (middle)

As a young scientist, most of my time is spent on the internet looking for articles to read so I can better understand seismology. This is how I came across a poster about the School for Young Scientists “Methods of Comprehensive Assessment of Seismic Hazard” http://school2017.gcras.ru/, I was immediately drawn to researching further on what the school is about. I knew that I would like to attend and improve my knowledge of seismology, when I found out that the school will be devoted to the new methods recently developed for seismic hazard assessment and integration on the basis of the systems analysis of results obtained by these methods.

When I applied for the training I was worried about the cold weather in Russia. I recall when I got the email stating that I had been accepted for the training course, I went onto Google and searched how the weather is like in Russia. Being a South African who is currently based in Pretoria where winter means temperatures are low in early morning hours and at night only, unless there is a cold front, you can understand my despair with cold weather!

Thifhelimbilu (rights) with Natalia from Vladivostok University

Of course, the weather was not the only thing I was worried about, language was also in that basket. Therefore, a couple of weeks before the training I tried to learn a few Russian words that could get me by. This evidently became a futile exercise when I landed at the airport and couldn’t read a word on the signs. I had to ask around to figure out my way to the train. I really appreciated the woman I met when we were in the queue for passport control. She had been in Russia before; therefore she knew her way and showed me where I can get the train. From there I was asking anyone I met and Russians were the friendliest to me.

The programme of the school covered exactly the reason why I became a seismologist in the first place. When I first heard about seismology, I was eager to at least figure out how we can predict earthquakes as they are by far the most dangerous natural hazard. Therefore, looking at the programme I knew that I had to go there, I had to meet people who are studying every day of their lives exactly what I had always wanted to do (see the programme and presentations at http://school2017.gcras.ru/e.materials.html).

The school exceeded my expectations; the lectures were just the kind of smart I have been yearning for the whole duration of my career as a seismologist. The work that they are doing is beyond what anyone can imagine. Can you imagine the day we can predict an earthquake? I bet you disaster management in every country will be ecstatic to that discovery.

The topics that mostly caught my eye were the identification of earthquake prone areas, calculating maximum magnitude using statistics and how the accuracy of this gets tainted due to lack of data; and the investigations on how to predict earthquakes, with the highest confidence level possible. The biggest obstacle in conducting breakthrough research in seismology is lack of data, of which the biggest known cause for this is the station coverage in most of the countries and also sharing data. The problem of data sharing is one thing which CODATA is working towards improving.

Overall, this was a brilliant trip and I would like to extend my gratitude towards the organisers of the school; the Russian Science Foundation within the framework of RSF Project “Application of systems analysis for estimation of seismic hazard in the regions of Russia” and the Council for Geoscience for funding the school and my travel to Russia, respectively.

Group photograph from the School for Young Scientists “Methods of Comprehensive Assessment of Seismic Hazard

Thifhelimbilu Mulabisana: My trip to Beijing, China to attend the the CODATA International Training Workshop in Big Data for Science

Thifhelimbilu introduces herself at the Beijing Training Workshop

Thifhelimbilu Mulabisana is a Junior Scientist in the Geophysics Division of the Council for Geoscience in South Africa. Her day-to-day work involves the recording, processing and analysis of seismological data. The organization manages a network of over 50 seismic stations around the country and these are continuously streaming data into her office for processing. Thifhelimbilu attended the CODATA International Training Workshop in Big Data for Science in July 2016.  And in July 2017 she was able to follow this by attending the School for Young Scientists “Methods of Comprehensive Assessment of Seismic Hazard”, organised by the CODATA member organisation for Russia, the Geophysical Centre of the Russian Academy of Sciences. This is the first of two blog posts in which you can read about the experiences of one young researcher from South Africa in training activities that took her from Beijing to Moscow and back.

Two years before I went to the Beijing training workshop, one of my colleagues went to the same workshop and his feedback about the training was nothing but great. I then became eager to attend the training and so applied as soon as they advertised the course.

The training workshop was focused on promoting improved scientific and technical data management and use. This was exactly what I needed at the time as I was studying towards my MSc. My dissertation was focused on the earthquake catalogue of southern Africa and this was the biggest data I had ever worked with. It became more tedious and frustrating with time and I knew I needed to find better ways to deal with that amount of data.

Thifhelimbilu (right) and Nobubele (from CSIR in South Africa) at one of the social dinners

From the day I found out that I was going to China I was excited, I had never been to Asia. The thrill of going to a country where their medium of instruction is not English was both a challenge and nerve racking (though the Training Workshop is taught in English).

When I arrived in Beijing, my expectation about it was exceeded. Except of course for the stares I got for being black and having long dreadlocks! I suppose people in this part of the world do not get to see a lot of dreadlocks, as some of them even went as far as trying to take pictures of me. There were those who tried to be a bit polite and ask but some of them just went ahead and took the pictures. The whole experience had a certain level of violation but mostly taught me about the diversity we have as a human species.

Thifhelimbilu receives the participation certificate from Prof. LI Jianhui, Secretary General of CODATA China and a member of the CODATA Executive Committee

As most of the Chinese people do not speak nor understand English, and as much as I tried to learn the Chinese language using Google translate, the language barrier was a huge obstacle every time I had to get food. This issue was so evident so much that, most of the time I did not know what exactly I was eating! The first few days, this did not sit well with me but as time went by I was only concerned about how food tasted.

Day one of the training was blissful; I met brilliant young scientists from different fields. This encouraged me to do more for science and be better. Not forgetting meeting the lecturers, the giant scientist I have been longing to meet since I read the first pamphlet about the training course.

The real work began and as I had expected topics such as interdisciplinary applications of open research data, data intensive research, data management policies, cloud computing, visualization, analytics and data infrastructure development in the Big Data Age were covered precisely and greatly so. The practical sessions we had had the most impact by ensuring that I understood the topics well enough and I left every lesson confident that I will be able to do the same when I get back to my home country.

I can confidently confess that this course helped me with my MSc studies, which I completed successfully. I am most grateful to the organisers, CODATA, the Chinese Academy of Sciences and the Council for Geoscience for their sponsorship.

The 2016 Beijing Training Workshop Students at the Great Wall of China

TWAS Fellowship for Research and Advanced Training: Deadline 1 October

TWAS Research and Advanced Training Fellowship: 2017 Call for Applications 

TWAS, the academy of sciences for the developing world, www.twas.org, is now accepting applications for the TWAS Research and Advanced Training Fellowship programme.

The fellowships are offered to scientists from developing countries and are tenable at centres of excellence in various developing countries.

Eligible fields include one or more of the following: agricultural and biological sciences, medical and health sciences, chemistry, engineering, astronomy, space and earth sciences, mathematics and physics.

Please see http://www.twas.org/opportunity/twas-fellowships-research-and-advanced-training for the latest information regarding the above programme, including eligibility criteria, guidelines, etc.

Women scientists are especially encouraged to apply. The closing date is 1 October 2017.

Governance of domain specific data and metadata standards to support FAIR Data

By: Xiaogang (Marshall) Ma

On May 26, 2016, I attended the Workshop on Research Data Management [http://www.iucr.org/resources/data/dddwg/new-orleans-workshop#gabb2] at the 2016 Annual Meeting of the American Crystallographic Association, New Orleans, LA, USA and gave a talk on Open Science, FAIR DATA and Data Standards.

The workshop was organized by the International Union of Crystallography (IUCr)’s Diffraction Data Deposition Working Group (DDDWG), and was co-chaired by John R. Helliwell and Brian McMahon, who are the DDDWG chair and the IUCr CODATA representative, respectively. The workshop had two plenary sessions: (1) What every experimentalist needs to know about recording essential metadata of primary (raw) diffraction data and (2) Research Data Management policy mandates and requirements on Principal Investigators (PIs). It also covered a technical session on high-data-rate/high-performance-computing issues of research data management for MX. The first plenary session was closely related to the efforts within DDDWG, and the second session covered broad topics on the open science trends, open data mandates, best practices and successful stories. The technical session covered demonstration of state-of-the-art progress from industry.

My 30-minute talk was in the second plenary session. The talk was originally intended to be given by Simon Hodson, CODATA executive director. Due to a travel schedule issue, he could not make it, but he helped provided the main body of the presentation slides. For me this was also a nice experience to re-fresh my knowledge about open science, FAIR Data, data standards and CODATA’s many activities in relation to these issues. Especially I really enjoyed introducing a slide in which Simon put together the historical events of policy push for Open Access, Open Data and Open Science. To explain the slide in detail I also did some background study. For example, the three B activities (Budapest, Berlin and Bethesda) during 2002-3 were well known for promoting Open Access. We can see the significant increase in the number of open access publications since then [https://en.wikipedia.org/wiki/Open_access]. Then, how about Open Data and the efforts ongoing now, such as FAIR Data? Can we foresee that after 10 or 15 years there will be positive results similar to Open Access? To achieve that more efforts are needed from all the stakeholders, including every one of us. Within CODATA I have been working together with Dr. Lesley Wyborn and other colleagues in a Task Group [http://www.codata.org/task-groups/coordinating-data-standards] that aims at surveying and coordinating data standard efforts amongst scientific unions.

During the past months, our Task Group has been contributing to efforts led by CODATA to broaden inter-unions coordination and collaboration. Besides giving the talk, another role for me at the New Orleans workshop is to set up deeper connections between IUCr and CODATA. IUCr has done excellent work on data standards and open data. It is also one of the first scientific bodies that endorsed the Science International Accord on Open Data in a Big Data World. IUCr also published a position paper [http://www.iucr.org/iucr/open-data] as a response to the accord. Prof. John Helliwell will be the IUCr representative to attend the Inter-Union Workshop on 21st Century Scientific and Technical Data – Developing a roadmap for data integration. The workshop is sponsored by CODATA’s new Commission on Data Standards for Science and will take place in Paris France on 19-21 June 2017. The workshop’s purpose is to share details of our data and information activities, agree on good practice, seek consensus about how unions and disciplinary groups can best work together in establishing a global network of scientific research data that is consistent with the four principles of FAIR Data – i.e., that data produced by research and for research should be Findable, Accessible, Interoperable and Reusable. Based on the outputs of the workshop, a substantially larger workshop or conference will take place in late 2017 or early 2018 to discuss the potential and scope of a broad coordinated effort across the scientific community and the establishment of an ICSU and CODATA Commission as part of a decadal initiative to promote the data standards necessary for inter-disciplinary research including that which addresses the priority global challenges.

Humans of Data 16

IMG_3801-edit1-900x600“I wonder whether any ethics are applied in collection of samples of Ebola and HIV/AIDS in emergency situations.  When I talk to doctors about it, they are aware that some researchers from the developed world provide expertise and fund research in pandemic situations.  But there are issues on data collection ethics based on informed consent by subjects that deserve scrutiny, given the emergency situations and language barriers under which data is collected.  Are there Memoranda of Understanding between African governments and researchers under these conditions?  There is a need for transparency and openness.  Given the extensive ethical regulations for research on human subjects in the developed world, African countries – which are prone to pandemic disease problems – must engage in the discourse on ethics of data collected under the unique situations that they experience.

It’s the same in the humanities and social sciences: researchers come and take and go.  It is rare for research projects to include funding in the initial project proposal for reporting back to the subjects of the research.  In Botswana, there was a national scan for indigenous knowledge.  We were promised there would be a report back [to the community], but the [research team] never came back.  And then researchers are surprised that participants don’t trust them!

Colonialism was first about land resources.  Now, without open access, globalisation of research may become the next wave of colonisation.  Lower and middle income country researchers need to engage in open debates among themselves on the ownership of data, and how to develop collaboration from collection to analysis with a view to facilitating shared benefits and innovative re-use.  Only in this way will the issues of intellectual property rights be negotiated in an equal exchange.  All researchers – but especially Africa’s researchers – should reflect on the necessary policy and regulatory frameworks that should be negotiated with local institutions and national governments, as part of their intellectual contributions to evidence-based solutions and sustainable development.

Openness is about exposing your strengths and weaknesses.  No one should be intimidated that some have more money.  Others have ideas.”

Humans of Data 15

IMG_3846-edit1-900x600“I’m very passionate about open science.  From where I stand in the context of Africa, there’s so much data we create in government, universities or communities. But at the moment, data, which is the base for reports and provides evidence for government decisions, is not accessible to all except the researcher and specialist research reference committees.  The Botswana Government has a closed research culture, as does the local research community within the academic and private sector circles.  As a librarian, that has always been my concern.  When work is done in that way, you find that the resultant data is archived and owned by the funding authority.

The current system is dysfunctional due to a lack of regulatory mechanisms, appropriate follow-up processes and systems for creating national open databases. Without reliable databases for research reports, research data cannot be open or accessible.”

Humans of Data 14

IMG_3982-edit

“My background is in the political and social sciences, and I can connect with a lot of the ethical issues, the equality issues, making data sharing genuinely equal here and in other countries – that’s important to me.

I get a kick out of helping researchers. Working in a library context I was helping a student find an article they needed. Now I can sit down with a researcher and provide reassurance on their data management plan and that’s important too.

Having enough technical knowledge so that you can understand what’s going on, and also using liaison skills – I really like that combination.  There’s a whole community at the university that is interested in open data, there are all these people who are really excited about it.  I’m excited because there’s a community who is excited about the same things I am.  It’s good that everyone’s still working out solutions for data sharing; you want to help build these resources, and a culture change.

But in the social sciences and humanities, we need to recognise we actually have data.  There needs to be a default to open, which relies upon a change in culture and policy.  We can archive some of these datasets through liaison with young researchers and learning how researchers work.  Young researchers are going to be sustaining the effort, but this needs everyone’s participation.”

International Training workshop on Big Data for Science and Sustainability

Opening ceremony of training workshop

Opening ceremony of training workshop

The CODATA PASTD – IGU joint action of the International Training workshop on Big Data for Science and Sustainability in Developing Countries was successfully held from 17th -19th March, 2017 in Hyderabad, India. Training workshop is academic event of The Xth IGU International Conference on “Urbanization, Health & Well Being and Sustainable Development Goals”. Supported by the International Geographical Union (IGU) and Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences (IGSNRR, CAS), the Hyderabad training workshop is one of CODATA PASTD’s three capacity building activities in 2017. Other training activities will be held in Madagascar in September and in China in November.

The training course introduces young scientists to the ideas of open data, data sharing and data publication.  The training also covers Big Data, data analysis and applications in order to develop skills as ‘data scientists’.  The three day training workshop included lectures and hands-on practice, which aims to develop the skills and capacity necessary for preservation of and open access to research data in developing countries.

Prof. R..B.Singh, Vice Chairman of the International Federation (IGU), delivered an opening speech

Prof. R..B.Singh, Vice Chairman of the International Federation (IGU), delivered an opening speech

Prof. R.B.Singh, Vice President of the International Geographical Union (IGU) and Co-Chair of Strategy and Policy Sub-group of CODATA PASTD, and Yukio Himiyama, President of the International Geographical Union, attended the opening and closing ceremonies respectively. 56 students from 13 universities in India attended the training courses. CODATA PASTD member, Dr. Yunqiang Zhu, Co-chair of Capacity Building Sub-group of CODATA PASTD and professor from IGSNRR, CAS, and Dr. B. Srinagesh from Osmania University organized the training as co-chairs. Chinese scientists worked along with Indian colleagues to give courses on open Big

Yukio Himiyama, President of the International Federation (IGU), awarded a certificate to the trainees

Yukio Himiyama, President of the International Federation (IGU), awarded a certificate to the trainees

Data discovery, data publication and sharing, the Indian Earth observation system, geospatial data interoperability, geospatial data infrastructure and data sharing principles.

Participation in the training workshop was active and enthusiastic and students reported the results were beneficial and favourable. Professor R. R. Shingh, and Dr. V. Raghavaswamy, Deputy Director of the National Remote Sensing Centre, India, expressed their hope that the PASTD training course will continue in future and cultivate new generation of young data scientists with growing awareness of developments in data science and the benefits of international cooperation.

The closing ceremony of the training workshop

The closing ceremony of the training workshop