Programming in Python for Data Science

Welcome to Programming in Python for Data Science ! This course is part of the Key Capabilities for Data Science program and will teach you how to conduct data analysis in Python. During the course, you will work with powerful Python packages made for data-science, including Pandas for processing tabular data, Altair for data visualization and NumPy for working with numerical data types. You will also learn about iteration, flow control, and the data types relevant to data exploration and analysis. You will leave this course capable of processing raw data into a format suitable for analysis, writing your own analysis functions, and deriving data-driven insights via the creation of interactive visualizations and summary tables.

Course prerequisites: None

Module 0: Welcome to Programming in Python for Data Science

Course introduction and summary of course learning outcomes

Module 1: Python & Pandas - An Unexpected Friendship

In this module, you will be introduced to dataframes, the Python package Pandas, simple manipulations and Visualizations.

Module 2: Not So Scary Wrangling (Table Manipulation and Chaining)

In this module, you will learn how to import different types of files, perform more advanced table manipulations (modifying and creating new columns) as well as method chaining conventions (style, including multi-line).

Module 3: Tidy Data and Joining Dataframes

In this Module, you will learn about tidy data and how to transform your dataset into a tidy format. It will also focus on how to concatenate and join multiple dataframes.

Module 4: Python Without the "Eek" (Basic Python)

In this module, you will learn about basic Python data types and structures. You will explore what data types and structures are used to create a Pandas dataframe and how understanding column dtypes is important to data analysis.

Module 5: Making Choices and Repeating Iterations

In this module, you will learn how to write conditionals statements and learn the fundamentals of how to create code that efficiently repeats the same operations by following the DRY principle.

Module 6: Functions Fundamentals and Best Practices

In this module, you will expand your knowledge on the concept of functions that were introduced in Module 5. This module covers how to develop good habits when writing functions like including docstrings, defensive programming, test-driven development and how to compose useful functions.

Module 7: Importing Files and the Coding Style Guide

In this module, you will learn about how to import files and libraries from other directories and stylize your code for optimal readability.

Module 8: A Slice of NumPy and Advanced Data Wrangling

In this module, you will learn about NumPy arrays and more advanced wrangling techniques such as handling columns with dates and strings and identifying null values.

Module Closing Remarks

Well done on finishing Programming in Python for Data Science.

About this course

Learn the fundamentals of programming in Python, including how to clean, filter, arrange, aggregate and transform data. You will learn the foundations of programming in Python while writing human-readable code that sets a foundation of best practices and coding style. You will gain the skills to clean, filter, manipulate (wrangle) and summarize data using Python libraries for more effective data analysis. An overview of data structures, iteration, flow control and program design relevant to data exploration and analysis will be addressed along with fundamental programming concepts such as loops, conditionals and data structures that create a solid foundation in data science programming.

About the program

The University of British Columbia (UBC) is a comprehensive research-intensive university, consistently ranked among the 40 best universities in the world. The Key Capabilities in Data Science program was launched in September 2020 and is developed and taught by many of the same instructors as the UBC Master of Data Science program.