CODATA Connect Webinar 4: The importance of data cleaning

The fourth webinar in this series, took place on 5th August 2021.

The slides are available at the link: Download Presentation

The recording is available below from Vimeo or in the CODATA GoToWebinar Channel

Date: 05th August 2021
Time: 10 am (UTC)
Duration: 40 min session and 20 min Question Answers (Total 1 hour)

Registration link: https://attendee.gotowebinar.com/register/2137644534742391819

Data cleaning might seem dull and uninteresting, but it’s one of the most important tasks you would have to do as a data science professional. Correcting or removing “dirty data” improves the reliability and value of response data for better decision-making. Data cleaning involves the detection and removal (or correction) of errors and inconsistencies in a data set due to the corruption/irrelevance or inaccurate entry of the data. Incomplete, inaccurate or irrelevant data is identified and then either replaced, modified or deleted.

Incorrect or inconsistent data can create a number of problems which lead to the drawing of false conclusions. Therefore, data cleaning can be an important element in some data analysis situations. Having wrong or bad quality data can be detrimental to your processes and analysis. Poor data can cause a stellar algorithm to fail. However, data cleaning is not without risks and problems including the loss of important information or valid data.

Data cleansing is also important because it improves your data quality and in doing so, increases overall productivity. When you clean your data, all outdated or incorrect information is gone – leaving you with the highest quality information. This ensures you do not have to wade through countless outdated documents and allows you to make the most of your project hours

Name of the Speaker: Simisani Ndaba
Designation: Teaching Assistant
Affiliation: University of Botswana

Simisani has a history of working in the higher education industry having been working at the Department of Computer Science at the University of Botswana as a Teaching Assistant since 2016. She graduated with her Masters of Science in Computer Information Systems where her research work was based on Information Retrieval in Authorship Identification using authors’ writing styles using PAN at CLEF. PAN is a series of scientific events and shared tasks on digital text forensics and stylometry. Prior to that, she worked as a Business Analyst at the Gauteng Department of Education working on data management and business intelligence in South Africa. She also holds a Bachelor’s degree in Business Information Systems and is due to complete a Post Graduate Diploma in Education, a teacher/trainer qualification in October 2021. She is part of the Ladies in R Botswana based in the University of Botswana and is an assistant in Health Informatics Africa.