This post was written by Shaily Gandhi, who is currently pursuing a PhD in Geomatics from CEPT University, India. Shaily attended the CODATA-RDA School of Research Data Science, hosted at ICTP, near Trieste, Italy.
Urban Data Science is a course which is an outcome of the collaboration which took place in CODATA Research Data workshop in Trieste 2017. The course of urban data science was hosted by CEPT University, Ahmedabad, India from May 14 – May 19, 2018 to address the challenges with poor use of available open data in decision making while keeping urban in focus. The course had been designed to get students started with the basic data science components in a short span of 6 days. The aim of the course was to give an insight on open urban Data Sets and insights for interaction with other sources of data freely available. These Data Sets allow a deeper understanding of the urban and its problems, allowing the students to have a firmer control over possible bias and therefore analysing and giving solutions for overcoming thesituations.
The course was carefully designed for students from different backgrounds like planning, architects, civil engineer, geomatics and other disciplines from both bachelors and masters level who belonged to IT and non- IT background. The lessons of the basics of R were prepared by using the material of software carpentry lessons Programming with R and R for Reproducible Scientific Analysis. The concepts were taken and the lessons were redesigned focusing on urban problems and analysis. The school begun with setting a study objective using techniques to develop a research concept, planning area of study, thus bearing in mind the type of data avaible from Open data sets for urban research to be captured, whether continuous, discrete, ordinal or nominal data and the different stages of statistical analysis that can be conducted in other to produce results. Knowledge on research methodologies and implementation of statistical application software’s to support data analysis was one of the vital goals of the course. The Statistical software package called “R” was used as it has become a very powerful and useful tool for the purpose of data cleaning, management, statistical analysis and data graphical visualization. When mastered, this application is user friendly and could reduce the time and efforts of the researcher, student and professionals.
Innovative teaching techniques like mixing theory and practical’s with group work were followed in this course as it had diverse students attending and it required a special attention to keep the whole class in the same pace. Despite the course being intense from morning 9 am till evening 7 pm it was very motivating to see the students following up with the topics and catching up with the pace of the instructor. Daily feedback was taken from the students to enhance class activity decisions by tutors. Course was modified daily with more group activities and practical’s based on students feedbacks received. Continuous constructive comments from the students made it more effective as the tutors were able to achieve the desired output by changing the teaching method according to the requirement of the students. This process of understanding the capability of the students was well appreciated.
The second aim of this course is to transform the traditional teaching techniques into a newer form in which the students have an energetic and innovative involvement to improve the way the course is taught and in the process enhance their proficiencies in solving data driven case studies in practice. By the end of the course it was a great pleasure to receive outputs of the case study which had Data Science for urban studies. Some of the outstanding studies are Traffic Violation in Montgomery County using the data from Public Safety department from the government portal data.montgomerycountymd.gov. Another study on the crime in the city of Chicago was also considered by one of the student where the used of open data was done. Another study was done on the monitoring the trend of border crossing vehicles in USA which showed interesting pattern. Study of air quality for major urban states of US showed interesting pattern stating that majority of the US is affected by medium concentration of PM (Particulate matter). Many more interesting topics were studied which gave a very good inside of the understanding of the students about data science. Students analyzed and interpreted the spatial behavior of the urban data with Geospatial as well as Graph Analysis.