Webinar 1: Processing Textual Data with Python

The first webinar in the CODATA Connect Series on Research Skills Enhancement, took place on 4th February 2021. This webinar was co-hosted with the CODATA/RDA Schools for Research Data Science Alumni Network

Raphael Cobe, Research Associate, Advanced Institute for Artificial Intelligence/Sao Paulo State University

The slides are available at the link: https://raphaelmcobe.github.io/text-processing-seminar/
The source code for the presentation and also a few coding examples are available at the github repo: https://github.com/raphaelmcobe/text-processing-seminar

The recording is available below from Vimeo or in the CODATA GoToWebinar Channel

Date: Thursday, 4th February (17:00 GMT)
Time: 17:00 – 18:00 GMT
Duration: 40 min session and 20 min Question Answers (Total 1 hour)

Registration Link: https://attendee.gotowebinar.com/register/8010107547369358863

This webinar is co-hosted webinar with the CODATA/RDA Schools for Research Data Science Alumni Network

String data is becoming more and more common, especially with the amount of textual data available on the web. In this scenario, being able to manipulate, normalize, and clean textual data has become a desirable skill set. In this session we will talk about basic operations on String data type using Python. We will also talk about how to use Regular Expressions to find patterns or remove useless data.

Name of the Speaker: Raphael Cobe
Designation: Research Associate
Affiliation: Advanced Institute for Artificial Intelligence/Sao Paulo State University

Raphael Mendes de Oliveira Cobe studied Computer Engineering at the Federal University of Rio Grande do Norte, then received his M.Sc. on Computer Science with emphasis on Software Engineering and Natural Language Processing. Then, he finished his Ph.D. on Computer Science at the Institute of Mathematics and Statistics of the University of São Paulo with emphasis on Artificial Intelligence. He accounts for over 10 years of experience on software development. He works as Research Associate at UNESP Center for Scientific Computing since 2015 on projects related to Machine Learning, High Performance Computing and Big Data. He is also part of the team that maintains the GridUnesp infrastructure.