## Replacing scalar values I

In this exercise, we will replace a list of values in our dataset by using the `.replace()` method with another list of desired values.

We will apply the functions in the `poker_hands` DataFrame. Remember that in the `poker_hands` DataFrame, each row of columns R1 to R5 represents the rank of each card from a player's poker hand spanning from 1 (Ace) to 13 (King). The Class feature classifies each hand as a category, and the Explanation feature briefly explains each hand.

The `poker_hands` DataFrame is already loaded for you, and you can explore the features `Class` and `Explanation`.

Instructions

1. Replace every hand (row) of the DataFrame listed as Class 1 (One Pair) to -2 and each hand listed as Class 2 (Two Pairs) to -3.

In [2]:
# Import pandas
import pandas as pd

# Import dataset
poker_hands = pd.read_csv('poker_hands.csv')

In [3]:
# Replace Class 1 to -2 
poker_hands['Class'].replace(1, -2, inplace=True)

# Replace Class 2 to -3
poker_hands['Class'].replace(2, -3, inplace=True)

print(poker_hands[['Class', 'Explanation']])

       Class      Explanation
0          9      Royal flush
1          9      Royal flush
2          9      Royal flush
3          9      Royal flush
4          9      Royal flush
...      ...              ...
25005      0  Nothing in hand
25006      0  Nothing in hand
25007      0  Nothing in hand
25008      0  Nothing in hand
25009      0  Nothing in hand

[25010 rows x 2 columns]


## Replace scalar values II

As discussed in the video, in a `pandas` DataFrame, it is possible to replace values in a very intuitive way: we locate the position (row and column) in the Dataframe and assign in the new value you want to replace with. In a more `pandas`-ian way, the `.replace()` function is available that performs the same task.

You will be using the `names` DataFrame which includes, among others, the most popular names in the US by year, gender and ethnicity.

Your task is to replace all the babies that are classified as `FEMALE` to `GIRL` using the following methods:

- intuitive scalar replacement
- using the `.replace()` function

Instructions

1. Replace all the babies that are classified as `'FEMALE'` to `'GIRL'` as described above.
2. Replace all the babies that are classified as `'FEMALE'` to `'GIRL'` using the `.replace()` function. Set `inplace` to `True` to assign the result back to the original DataFrame.
3. Which of the two methods presented in the previous exercises is the most efficient when replacing a scalar value?

In [6]:
# Import dataset
names = pd.read_csv('names.csv')

# Import time
import time

In [8]:
start_time = time.time()

# Replace all the entries that has 'FEMALE' as a gender with 'GIRL'
names['Gender'].loc[names['Gender'] == 'FEMALE'] = 'GIRL'

print(f'Time using .loc[]: {time.time() - start_time} sec')

Time using .loc[]: 0.010024547576904297 sec


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [9]:
start_time = time.time()

# Replace all the entries that has 'FEMALE' as a gender with 'GIRL'
names['Gender'].replace('FEMALE', 'GIRL', inplace=True)

print(f'Time using .replace(): {time.time() - start_time} sec')

Time using .replace(): 0.002912759780883789 sec


## Replace multiple values I

In this exercise, you will apply the `.replace()` function for the task of replacing multiple values with one or more values. You will again use the names dataset which contains, among others, the most popular names in the US by year, gender and Ethnicity.

Thus you want to replace all ethnicities classified as **black** or **white** non-hispanics to non-hispanic. Remember, the ethnicities are stated in the dataset as follows: `['BLACK NON HISP', 'BLACK NON HISPANIC', 'WHITE NON HISP' , 'WHITE NON HISPANIC']` and should be replaced to 'NON HISPANIC'.

Instructions

1. Replace all the ethnicities that are not Hispanic in the dataset to `'NON HISPANIC'` using the `.loc()` indexer.
2. Replace all the ethnicities that are not Hispanic in the dataset to `'NON HISPANIC'` using the `.replace()` function.

In [12]:
start_time = time.time()

# Replace all non-Hispanic ethnicities with 'NON HISPANIC'
names['Ethnicity'].loc[(names['Ethnicity'] == 'BLACK NON HISP') | 
                       (names['Ethnicity'] == 'BLACK NON HISPANIC') | 
                       (names['Ethnicity'] == 'WHITE NON HISP') | 
                       (names['Ethnicity'] == 'WHITE NON HISPANIC')] = 'NON HISPANIC'

print(f'Time using .loc[]: {time.time() - start_time} sec')

Time using .loc[]: 0.015582084655761719 sec


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [13]:
start_time = time.time()

# Replace all non-Hispanic ethnicities with 'NON HISPANIC'
names['Ethnicity'].replace(['BLACK NON HISP', 'BLACK NON HISPANIC', 
                            'WHITE NON HISP', 'WHITE NON HISPANIC'], 'NON HISPANIC', inplace=True)

print(f'Time using .replace(): {time.time() - start_time} sec')

Time using .replace(): 0.0034189224243164062 sec


## Replace multiple values II

As discussed in the video, instead of using the `.replace()` function multiple times to replace multiple values, you can use lists to map the elements you want to replace one to one with those you want to replace them with.

As you have seen in our popular names dataset, there are two names for the same ethnicity. We want to standardize the naming of each ethnicity by replacing

- `'ASIAN AND PACI'` to `'ASIAN AND PACIFIC ISLANDER'`
- `'BLACK NON HISP'` to `'BLACK NON HISPANIC'`
- `'WHITE NON HISP'` to `'WHITE NON HISPANIC'`

In the DataFrame `names`, you are going to replace all the values on the left by the values on the right.

Instructions

1. Replace all the ethnicities by their respective alternative, as indicated above.

In [14]:
start_time = time.time()

# Replace ethnicities as instructed
names['Ethnicity'].replace(['ASIAN AND PACI', 'BLACK NON HISP', 'WHITE NON HISP'], 
                           ['ASIAN AND PACIFIC ISLANDER', 'BLACK NON HISPANIC', 'WHITE NON HISPANIC'], inplace=True)

print(f'Time using .replace(): {time.time() - start_time} sec')

Time using .replace(): 0.011382341384887695 sec


## Replace single values I

In this exercise, we will apply the following replacing technique of replacing multiple values using dictionaries on a different dataset.

We will apply the functions in the data DataFrame. Each row represents the rank of 5 cards from a playing card deck, spanning from 1 (Ace) to 13 (King) (features R1, R2, R3, R4, R5). The feature 'Class' classifies each row to a category (from 0 to 9) and the feature 'Explanation' gives a brief explanation of what each class represents.

The purpose of this exercise is to categorize the two types of flush in the game (`'Royal flush'` and `'Straight flush'`) under the `'Flush'` name.

Instructions

1. Replace every row of the DataFrame listed as `'Royal flush'` or `'Straight flush'` in the 'Explanation' column to `'Flush'`.

In [24]:
# Replace Royal flush or Straight flush to Flush
poker_hands.replace({'Royal flush':'Flush', 'Straight flush':'Flush'}, inplace=True)
print(poker_hands['Explanation'].head())

0    Flush
1    Flush
2    Flush
3    Flush
4    Flush
Name: Explanation, dtype: object


## Replace single values II

For this exercise, we will be using the `names` DataFrame. In this dataset, the column 'Rank' shows the ranking of each name by year. For this exercise, you will use dictionaries to replace the first ranked name of every year as `'FIRST'`, the second name as `'SECOND'` and the third name as `'THIRD'`.

You will use dictionaries to replace one single value per key.

You can already see the first 5 names of the data, which correspond to the 5 most popular names for all the females belonging to the `'ASIAN AND PACIFIC ISLANDER'` ethnicity in 2011.

Instructions

1. Replace the ranks, indicated in numbers, by strings, following the pattern given above. Don't hesitate to explore your dataset in the Console after replacing values to see how it changed.

In [28]:
# Replace the number rank by a string
names['Rank'].replace({1:'FIRST', 2:'SECOND', 3:'THIRD'}, inplace=True)
names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,GIRL,ASIAN AND PACIFIC ISLANDER,SOPHIA,119,FIRST
1,2011,GIRL,ASIAN AND PACIFIC ISLANDER,CHLOE,106,SECOND
2,2011,GIRL,ASIAN AND PACIFIC ISLANDER,EMILY,93,THIRD
3,2011,GIRL,ASIAN AND PACIFIC ISLANDER,OLIVIA,89,4
4,2011,GIRL,ASIAN AND PACIFIC ISLANDER,EMMA,75,5


## Replace multiple values III

As you saw in the video, you can use dictionaries to replace multiple values with just one value, even from multiple columns. To show the usefulness of replacing with dictionaries, you will use the `names` dataset one more time.

In this dataset, the column 'Rank' shows which rank each name reached every year. You will change the rank of the first three ranked names of every year to `'MEDAL'` and those from 4th and 5th place to `'ALMOST MEDAL'`.

You can already see the first 5 names of the data, which correspond to the 5 most popular names for all the females belonging to the `'ASIAN AND PACIFIC ISLANDER'` ethnicity in 2011.

Instructions

1. Replace the first three ranked names of every year to `'MEDAL'`.
2. Replace the fourth and fifth ranked names of every year to `'ALMOST MEDAL'`.

In [32]:
# Replace the rank of the first three ranked names to 'MEDAL'
names.replace({'Rank': {1:'MEDAL', 2:'MEDAL', 3:'MEDAL'}}, inplace=True)

# Replace the rank of the 4th and 5th ranked names to 'ALMOST MEDAL'
names.replace({'Rank': {4:'ALMOST MEDAL', 5:'ALMOST MEDAL'}}, inplace=True)

names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,GIRL,ASIAN AND PACIFIC ISLANDER,SOPHIA,119,MEDAL
1,2011,GIRL,ASIAN AND PACIFIC ISLANDER,CHLOE,106,MEDAL
2,2011,GIRL,ASIAN AND PACIFIC ISLANDER,EMILY,93,MEDAL
3,2011,GIRL,ASIAN AND PACIFIC ISLANDER,OLIVIA,89,ALMOST MEDAL
4,2011,GIRL,ASIAN AND PACIFIC ISLANDER,EMMA,75,ALMOST MEDAL


## Most efficient method for scalar replacement

If you want to replace a scalar value with another scalar value, which technique is the most efficient??