Select Page

Webinar 9: Processing Textual Data with Python

The ninth webinar in the CODATA Connect Series on Research Skills, took place on 4th February 2021. This webinar was co-hosted with the CODATA/RDA Schools for Research Data Science Alumni Network

Raphael Cobe, Research Associate, Advanced Institute for Artificial Intelligence/Sao Paulo State University

The recording is available below from Vimeo or in the CODATA GoToWebinar Channel

  • Date: Thursday, 4th February (17:00 GMT)
    Time: 17:00 – 18:00 GMT
    Duration: 40 min session and 20 min Question Answers (Total 1 hour)
  • Registration Link: https://attendee.gotowebinar.com/register/8010107547369358863
  • This webinar is co-hosted webinar with the CODATA/RDA Schools for Research Data Science Alumni Network
  • String data is becoming more and more common, especially with the amount of textual data available on the web. In this scenario, being able to manipulate, normalize, and clean textual data has become a desirable skill set. In this session we will talk about basic operations on String data type using Python. We will also talk about how to use Regular Expressions to find patterns or remove useless data. 
  • Name of the Speaker: Raphael Cobe
    Designation: Research Associate
    Affiliation: Advanced Institute for Artificial Intelligence/Sao Paulo State University
  • Raphael Mendes de Oliveira Cobe studied Computer Engineering at the Federal University of Rio Grande do Norte, then received his M.Sc. on Computer Science with emphasis on Software Engineering and Natural Language Processing. Then, he finished his Ph.D. on Computer Science at the Institute of Mathematics and Statistics of the University of São Paulo with emphasis on Artificial Intelligence. He accounts for over 10 years of experience on software development. He works as Research Associate at UNESP Center for Scientific Computing since 2015 on projects related to Machine Learning, High Performance Computing and Big Data. He is also part of the team that maintains the GridUnesp infrastructure.