Skip to content

Today's Course

Course Materials

Learning Objectives

What will we likely need to know how to do in order to produce a clean dataset?

  • fill in missing values or remove rows with missing values
  • break-up columns containing more than one chunk of data within cells into multiple columns
  • remove unecessary white space from cells
  • standardize data (fix typos & inconsistencies; format dates; standardize data types)
  • merge duplicate rows or drop duplicates
  • remove unnecessary data (drop extraneous variables or observations)
  • check for data discrepancies

Outside of this tutorial, the exact details for how to do each of these steps will differ, and a couple topics will be only briefly mentioned--namely, filling in missing values through value imputation and finding data outliers through statistical methods.

Data Cleaning with Python Video

The course will take approximately 2 hours to complete. You’re encouraged to take breaks as needed! Follow along in your Jupter notebook to our data cleaning with Python and Pandas video:

Reminders

To close out of Jupyter hit ‘File’ -> ‘Shutdown’ and close the browser window.

Additional Resources

Below are a list of resources that are mentioned in the Jupyter notebook:

Finally, I adapted information found in the following pages to help create this tutorial: