Today's Course

Course Materials

Learning Objectives

What will we likely need to know how to do in order to produce a clean dataset?

fill in missing values or remove rows with missing values
break-up columns containing more than one chunk of data within cells into multiple columns
remove unecessary white space from cells
standardize data (fix typos & inconsistencies; format dates; standardize data types)
merge duplicate rows or drop duplicates
remove unnecessary data (drop extraneous variables or observations)
check for data discrepancies

Outside of this tutorial, the exact details for how to do each of these steps will differ, and a couple topics will be only briefly mentioned--namely, filling in missing values through value imputation and finding data outliers through statistical methods.

Data Cleaning with Python Video

The course will take approximately 2 hours to complete. You’re encouraged to take breaks as needed! Follow along in your Jupter notebook to our data cleaning with Python and Pandas video:

Reminders

To close out of Jupyter hit ‘File’ -> ‘Shutdown’ and close the browser window.

Additional Resources

Below are a list of resources that are mentioned in the Jupyter notebook:

Finally, I adapted information found in the following pages to help create this tutorial: