# Writing Efficient Code with pandas

- 4 hours
- 14 Videos
- 45 Exercises

## Course Description

The ability to efficiently work with big datasets and extract valuable information is an indispensable tool for every aspiring data scientist. When working with a small amount of data, we often don’t realize how slow code execution can be. This course will build on your knowledge of Python and the pandas library and introduce you to efficient built-in pandas functions to perform tasks faster. Pandas’ built-in functions allow you to tackle the simplest tasks, like targeting specific entries and features from the data, to the most complex tasks, like applying functions on groups of entries, much faster than Python's usual methods. By the end of this course, you will be able to apply a function to data based on a feature value, iterate through big datasets rapidly, and manipulate data belonging to different groups efficiently. You will apply these methods on a variety of real-world datasets, such as poker hands or restaurant tips.

### 1 Selecting columns and rows efficiently

This chapter will give you an overview of why efficient code matters and selecting specific and random rows and columns efficiently.

- The need for efficient coding I
- What does time.time() measure?
- Measuring time I
- Measuring time II
- Locate rows: .iloc[] and .loc[]
- Row selection: loc[] vs iloc[]
- Column selection: .iloc[] vs by name
- Select random rows
- Random row selection
- Random column selection

### 2 Replacing values in a DataFrame

This chapter shows the usage of the replace() function for replacing one or multiple values using lists and dictionaries.

- Replace scalar values using .replace()
- Replacing scalar values I
- Replace scalar values II
- Replace values using lists
- Replace multiple values I
- Replace multiple values II
- Replace values using dictionaries
- Replace single values I
- Replace single values II
- Replace multiple values III
- Most efficient method for scalar replacement

### 3 Efficient iterating

This chapter presents different ways of iterating through a Pandas DataFrame and why vectorization is the most efficient way to achieve it.

- Looping using the .iterrows() function
- Create a generator for a pandas DataFrame
- The iterrows() function for looping
- Looping using the .apply() function
- .apply() function in every cell
- .apply() for rows iteration
- Vectorization over pandas series
- Why vectorization in pandas is so fast?
- pandas vectorization in action
- Vectorization with NumPy arrays using .values()
- Best method of vectorization
- Vectorization methods for looping a DataFrame

### 4 Data manipulation using .groupby()

This chapter describes the groupby() function and how we can use it to transform values in place, replace missing values and apply complex functions group-wise.

- Data transformation using .groupby().transform
- The min-max normalization using .transform()
- Transforming values to probabilities
- Validation of normalization
- When to use transform()?
- Missing value imputation using transform()
- Identifying missing values
- Missing value imputation
- Data filtration using the filter() function
- When to use filtration?
- Data filtration
- Congratulations!