Category Archives: DataTrieste

DataTrieste

Publishing an article in CODATA Data Science Journal

This article was first published by Ms. Neema Mduma https://neylicious.github.io/ml/2019/05/11/paper.html – Neema is an alumni of the CODATA-RDA School of Research Data Science.

In early 2017, I was privileged to work as a researcher in the Dropwall project (by Rose Funja) which was among the winning project of the Data for Local Impact Innovation Challenge (DLIIC). The main focus of the project was to develop a tool that will help fighting dropout among secondary school girls. The findings from this project show a high rate of dropout among secondary school students particularly girls, and coincide with reports from other studies which show that school dropout is a big challenge in developing countries. On addressing this problem, machine learning techniques has gained much attention in recent years. However, most of the work has been carried out in developed countries, there are only a handful of studies conducted in developing countries on school dropout using machine learning techniques with the consideration of local context and data imbalance problem. This motivated me to continue working (in my PhD) on school dropout using machine learning.

In August 2018, I attended a CODATA-RDA Research Data Science Summer School which was held at the Abdus Salam International Centre of Theoretical Physics (ICTP) in Trieste, Italy. The aim was on building competence in data analysis and security for participants from all disciplines and backgrounds from Sciences to Humanities. The level of engagements and interactions between participants and instructors was outstanding. We were introduced to various opportunities (by The Executive Director of CODATA, Dr. Simon Hodson) such as CODATA Data Science Journal where I later managed to publish the breathtaking findings from the Dropwall project titled A Survey of Machine Learning Approaches and Techniques for Student Dropout Prediction.

My new experience in Italy at the CODATA-RDA Research Data Science Summer School

This post was written by Neema Mduma. Neema recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy. Her participation was kindly supported by AFDB.

This post is a syndicated copy of the one at https://neylicious.github.io/ml/2018/10/03/italy.html

My PhD journey has been great so far, apart from the sleepless nights (totally worth it though!). Last year I attended different events in USA, it was a great exposure, and I enjoyed both the scientific and the social programmes. I was looking forward to find new opportunities and travel elsewhere to learn new experience and extend my existing network.In June, 2018 I received an invitation to attend the CODATA-RDA Research Data Science Summer School which was held at the Abdus Salam International Centre of Theoretical Physics (ICTP) in Trieste, Italy. The summer school was held from 6th to 17th August, 2018 with the aim on building competence in data analysis and security for participants from all disciplines and backgrounds from Sciences to Humanities.The level of engagement and interaction between participants and instructors in this summer school was outstanding, helpers were always there to provide technical assistance. I was exposed to useful Machine Learning techniques that I will apply in my ongoing study. The Executive Director of CODATA, Simon Hodson presented to us various opportunities such as CODATA journal and many others.

This summer school gave me an opportunity to extend my networks with other academics and experts in the field of Machine Learning. Additionally, I had a chance to experience new culture and explore new places like Rome, Venice and Ljubljana. Sadly, I was the only participant from Tanzania, so I encourage my fellow Tanzanians to apply for calls and seize opportunities in Data Science workshops and summer schools. Lastly, I would like to thank the organisers of the summer school for making it a great success, and the African Development Bank (AfDB) for the financial support.

Enriching my Learning by Helping Others at the CODATA-RDA Research Data Science Schools

Sara El Jadid has been a student then a helper at the CODATA-RDA Research Data Science Schools #DataTrieste and #DataSaoPaulo.  She has recently blogged about her experience on the Springer Nature Research Data Blog.
The CODATA-RDA School for Research Data Science is a valuable and very instructive initiative. The main goal is to provide foundational research data skills to early career researchers, prioritizing those from lower and middle income countries, but not excluding students from other parts of the world. …
 
I consider the experience gained by being involved with the CODATA-RDA Schools for Research Data Science as a very important and helpful step in my career as a young researcher. I am enrolling in a PhD in Bioinformatics – a contemporary and interdisciplinary field  that needs strong skills in “research data science”. It’s also a field where you have to interact with researchers and scientists from diverse area: biology, statistics, chemistry, physics, informatics to mention a few.

My Journey Towards Open Science and the CODATA-RDA Research Data Science Schools

Marcela Alfaro Córdoba @Fichulina has been a student then a helper at the CODATA-RDA Research Data Science Schools #DataTrieste and #DataSaoPaulo.  She has recently blogged about her experience on the Springer Nature Research Data Blog.

CODATA-RDA Research Data Science Schools changed my career, making me a more responsible researcher but also an Open Science ambassador for the Central American area. I now aspire to be a young researcher that can teach Open and Data Science principles through my job at the University of Costa Rica and through the CODATA-RDA Schools, as well as also serve as a mentor for other people that want to learn how to practice Open Science.

Report to ICTP, Trieste CODATA-RDA Research Data Science Summer School (1st – 12th August, 2016)

This post was written by Elias P.M. Mwakilama, Serving as the deputy head and Coordinator of eliasResearch, Seminar and Consultancies in the Mathematical Sciences Department at University of Malawi-Chancellor College, Elias Mwakilama is a young computational and applied mathematician in the field of operations research. Elias recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy – his participation was kindly supported by ACU.

acu

This report seeks to highlight key beneficial outcomes and recommendations based on the international summer on CODATA-RDA research data science school that I attended at ICTP, Trieste-Italy from 1st to 12th August, 2016 with financial support on travelling and lodging from the Association of Commonwealth Universities (ACU).

First and foremost, I must sincerely thank the school organizing committee for providing a positive answer to my application and for taking such an initiative to seek funding opportunities from ACU. I also thank ACU for providing such funding. I do not take that for granted.

Overall, the summer school was well organized as it brought together early career leading scientists across the world to discuss scientific ideas on importance of data sharing in scientific research from computational and applied statistics perspective in enhancing the scientific and technological capacity of developing countries.

Considering the fact that there exist issues of ethics and confidentiality in data handling, we need to seek means of protecting such data and promote spirit of sharing and publishing among us research scientists so as to cut down costs of replication or duplication of data collection and analysis processes. As such, it is relevant to seek for the right platforms for ensuring quality data handling and sharing in form of well-organized schools such as ICTP CODATA, networking and collaborations. Across the globe, mainly in Africa, financial support for early leading career scientist to attend to or participate in such research platforms and events remains a barrier. However, provision of research funds from scientific institutions such as ICTP-Trieste campus and The Association of Commonwealth Universities assists such financially challenged young leading scientists who consider internationally and locally organized research schools as a platform of the great learning and development of their opportunities. It is through support from institutions such ACU that I had access to the CODATA-RDA research school in my field and the chance to meet influential people who have shown interest to collaborate with and share data for scientific research and policy analysis.

By working closely with leading academics and invited guests at such school, I was exposed to a number of data programming skills which are of genuine and practical importance to my field of computational and applied mathematics in operations research. I have already begun using such skills in my academic research and also when teaching undergraduate students in our institution, Chancellor College. For instance, programming languages such as R-studio and Open source languages have aided me in successfully handling my fourth undergraduate course in Mathematics research.

Besides, being a pioneer of a newly established Mathematics and Statistics research group, Fibonacci Research Group (FRG) in our department (Mathematical Sciences), the summer school has assisted both me and the group in establishing research links with other academic applied statisticians which can develop far beyond the problem posed by the industry in Malawi to our existing research group. Leadership skills learnt from the summer school by observing the way the moderators coordinated their work on the problem and skills in presenting research material and in scientific communication would be used in coordinating research projects in the research group for the entire group benefits.

The summer school has also provided me with an opportunity to work on problems of genuine practical importance and do good computational mathematics in the process of designing new research areas that can be opened up leading to publications and new research collaborations. Skills acquired from the school would also provide an opportunity to our research group by applying knowledge and skills to significant practical problems and then stimulate in industry the awareness of the power of data sharing in statistical modeling and scientific computing.

With these many remarks, I therefore encourage your office to continue supporting young African early scientists through provision of similar financial support so as to develop our continent, in particular developing countries such as Malawi. I also wish to request for an idea of establishing more data sharing and analysis scientific centres such as ICTP in Africa where more early career research scientists such as myself could be able to get similar trainings at low costs and easier. I attach a certificate of participation that I got from the ICTP-CODATA RDA School of research science as an evidence of my full attendance and participation at the school.

CODATA-RDA School of Research Data Science

NIHARIKA GUJELAThis post was written by Niharika Gujela, who has a B.Tech in IT from Delhi Technological University, India. Niharika recently attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy – her participation was kindly supported by ICTP and TWAS.

This post is a syndicated copy of the one at https://thepursuitofweirdness.blogspot.in/2016/08/codata-rda-school-of-research-data.html

twas

 

 

 

ictp

 

 

I recently attended the ‘CODATA-RDA School of Research Data Science’ at International Centre for Theoretical Physics,Trieste, Italy. With participants from almost 16 developing countries from varied academic backgrounds, we had an amazing workshop with all hands-on training with specific tools and softwares.

Middle East and South-east Asia
Opinions from Africa
Ideas about Open Science from East and Australia
Reasons not to share data and counter-arguments

The idea of Open Science and it’s principles was the key focus of our workshop. We discussed a lot of myths and stereotypes surrounding our individual ideas of Open Science and how different factors influence different regions for open sharing.

While accessibility to Internet connection  is a major issue in Cuba, unwillingness to share one’s work before publishing it is mostly common among all regions.

With hands-on practice, we learned about topics ranging from basics like Unix, R, SQL and Git to advance like Neural Networks, High Performance Computing, Distributed Environment and Visualization.

A peak into one of the visualization ideas
Participants experimenting with visual ideas through pen and paper
 We got lucky enough to grab the recent journal of ‘Open Data in a Big world’, too.

 Apart from technical expertise, I met so many people and learned about new culture and places due to the global immersion. It was a lot of learning.The fact that how ‘where we are born’ can influence our lives so much, amazed me. How subtly we get entitled to so many things and we don’t appreciate them enough!

A Saudi Arabian friend told me that there are still walls within the university classrooms to segregate boys and girls.While a Cuban guy shared that there is no internet there, for general people. Just because he’s a professor, he can access the web at 36Kbps. I can’t even imagine, both the situations!

Somehow, they depict how important and hard , open access and sharing of research is, for some communities. And it’s a bigger and much needed goal!

My favorite success story from the workshop is about my roommate from India. With no technical background at all and complaints of how her programmer colleagues from office used to trick her by telling how complex work they are doing, she gained a huge confidence after the workshop and learned a lot of ‘know-hows’ about Data Science.

In and all, it was an amazing experience with lot of learning. I learned about international standards of Open access and data sharing. I got a huge community to keep the spirit of ‘Open Science’ high and spread across our own local communities.

Thank you all organizers, directors and sponsors for making it possible.

Reviving Past Forestry Research Works in Nepal using Zenodo

This post was written by Shiva KhanalResearch Officer with the Department of Forest Research and Survey in Nepal.  Shiva was one of the international scholars sponsored by GEO, the Group on Earth Observations to attend the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy.

zenodo

 

page_banner_l

 

 

 

Availability of research data and publications has been observed as one of the important challenges in Nepal. This is more relevant in case of research results and datasets published/produced in earlier periods. Even some of the research works published in the past were produced only in hard copies for distribution. Therefore, in many cases they are no longer available. Getting access to and/or understanding past observations are of great significance for wide range of research applications for instance past environmental change research.

One important source of information on Nepalese forestry is Banko Janakari (A journal of Forestry Information for Nepal). This journal is the oldest and has been continuously published by the Department of Forest Research and Survey (www.dfrs.gov.np) since 1987. The issues till now have covered a diverse theme of research on Nepalese forestry. There has been an initiative to make those journal available through NepJol (http://nepjol.info/index.php/BANKO). However, the availability is limited to only few issues. There exist challenges due to the fact that many of the issues are hard to find and even for those with hard copy available would require a lot of scanning and other related processing.

With the keen interest to make those papers available to the public, I had been looking for options to digitize and host them in open access repository. During my recent participation in the research data science school (http://indico.ictp.it/event/7658/), one of the session by Gail Clement (@repositorian) included information about the very interesting initiative called Zenodo (zenodo.org). Now, I have created a zenodo community of Department of Forest BankoV1N1_Page_02Research and Survey (https://zenodo.org/collection/user-dfrs). After some doing Optical Character Recognition (OCR) on scanned text, I have splitted some individual papers of the first issue of the journal (Vol 1 No 1 Spring 1987) and uploaded in the Zenodo.

I am still exploring the functionalities of Zenodo but so far based on my experience I found easy
to use interface, assigning DOIs and easy way to get access research through persistent links. This will most likely provide an important platform to make all Banko Jankari papers open access to all in near future. Further, there are various other publications that have become rare and likely to be lost forever. It will be really important to put them on open digital repositories like this.

Map Showing #DataTrieste Attendee Distribution

This post was written by Shiva KhanalResearch Officer with the Department of Forest Research and Survey in Nepal.  Shiva was one of the international scholars sponsored by GEO, the Group on Earth Observations to attend the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy.

The CODATA-RDA Research Data Science Summer School was held at the ICTP, Trieste, Italy
from 1st to 12th August 2016. I was one of the candidates who received funding from GEO, the Group on Earth Observations to attend this interesting event.

Looking at the big list of participants from around the world, as a “Map Enthusiastic Person” the first thing I was interested in was to visualize their distribution. Interestingly, in the summer school there was a presentation by Andy South (@southmapr) and he included a demonstration of making world maps in R using the tmap package.page_banner_l

I obtained list of participants with their countries in a spreadsheet from Simon Hodson. Took out the countries field and made a map using tmap package in R to show the distribution of participants and instructors in this summer school.

Countries_Represented

###########################################
# The R code to reproduce the map follows:
###########################################
#read a vector of countries represented – 35 in total
# I have saved the list of countries as a csv in my Dropbox and the code below will read it.
# Just found a better option for later use: a Dropbox interface for R (https://github.com/karthik/rdrop2)
download.file(“https://www.dropbox.com/s/56jim5t3gz67vzm/countries_represented_codata.csv?dl=1”, “countries_list.csv”)
codata_countries <- read.csv(“countries_list.csv”)$country
#install and load tmap package
if (!require(“tmap”)) install.packages(“tmap”)
#now lets map the countries
data(World)
codata_map <- tm_shape(World) +
tm_borders() +
tm_fill(“grey90″, aplha= 0.2)+
tm_grid(projection=”longlat”, labels.size = .3, lwd=0.5, col=”lightblue”) +
tm_shape(World[ World@data$name %in%  codata_countries , ]) + tm_fill(“red”)+
tm_text(“name”, size=”AREA”, col = “black”)+ #countries label proportional to area, so
smaller/no_label for smaller ones!
tm_borders(“grey20”) +
tm_layout(“Countries represented in CODATA-RDA School of Research Data Science, 2016”,
inner.margins=c(0,0,.1,0), title.size=.9, title.position = c(“center”, “top”))+
tm_style_natural(bg.color = “lightskyblue”)
codata_map
#save the output map
save_tmap(codata_map, “Countries_Represented.png”, width=2000, height=1200)
###########################################

The nice vignette for tmap package is here: https://cran.r-project.org/web/packages/tmap/vignettes/tmap-nutshell.html
Martijn Tennekes (2016). tmap: Thematic Maps. R package version 1.4-1. https://CRAN.R-project.org/package=tmap

Highlights and Wordclouds for #DataTrieste

This post comes from Sara El Jadid, a Ph.D. Student at Ibn Tofail University, Morocco and a 28703211511_37b1cd533f_bmember of the H3ABioNet Network http://www.h3abionet.org  Sara attended #datatrieste, the CODATA-RDA School of Research Data Science hosted at the International Centre for Theoretical Physics in Trieste, Italy, 1-12 August 2016.

 Using some of the photos from #datatrieste <https://www.flickr.com/photos/145909074@N04/albums> I had the idea to prepare a short video that features some of the things we learnt during the two weeks at ICTP and all the good moments we have shared.


I was also inspired by the idea of Shiva Khanel that was shared with us <http://codata.org/blog/2015/03/31/isi-codata-big-data-workshop-as-word-clouds/>, so I made two word clouds using the same technique in R.  The first one represents a wordcloud of all the topics we learnt at the CODATA-RDA School of Research Data Science and the second one represents a wordcloud of all the names of participants and their nationalities.

 Topics participant
I hope friends and colleagues from the #DataTrieste school will like this idea and will enjoy the wordclouds and the video.