# **Chapter 2 - Replacing values in a DataFrame.ipynb**

## **Replacing scalar values I**


In this exercise, we will replace a list of values in our dataset by using the .replace() method with another list of desired values.

We will apply the functions in the poker_hands DataFrame. Remember that in the poker_hands DataFrame, each row of columns R1 to R5 represents the rank of each card from a player's poker hand spanning from 1 (Ace) to 13 (King). The Class feature classifies each hand as a category, and the Explanation feature briefly explains each hand.

The poker_hands DataFrame is already loaded for you, and you can explore the features Class and Explanation.

Remember you can always explore the dataset and see how it changes in the IPython Shell, and refer to the slides in the Slides tab.

In [2]:
import pandas as pd

poker_hands = pd.read_csv('poker_hand.csv')

In [3]:
poker_hands.tail()

Unnamed: 0,S1,R1,S2,R2,S3,R3,S4,R4,S5,R5,Class
25005,3,9,2,6,4,11,4,12,2,4,0
25006,4,1,4,10,3,13,3,4,1,10,1
25007,2,1,2,10,4,4,4,1,4,13,1
25008,2,12,4,3,1,10,1,12,4,9,1
25009,1,7,3,11,3,3,4,8,3,7,1


In [4]:
# Replace Class 1 to -2 
poker_hands['Class'].replace(1, -2, inplace=True)
# Replace Class 2 to -3
poker_hands['Class'].replace(2, -3, inplace=True)

print(poker_hands[['Class']])

       Class
0          9
1          9
2          9
3          9
4          9
...      ...
25005      0
25006     -2
25007     -2
25008     -2
25009     -2

[25010 rows x 1 columns]


## **Replace scalar values II**


As discussed in the video, in a pandas DataFrame, it is possible to replace values in a very intuitive way: we locate the position (row and column) in the Dataframe and assign in the new value you want to replace with. In a more pandas-ian way, the .replace() function is available that performs the same task.

You will be using the names DataFrame which includes, among others, the most popular names in the US by year, gender and ethnicity.

Your task is to replace all the babies that are classified as FEMALE to GIRL using the following methods:

intuitive scalar replacement
using the .replace() function

In [5]:
names = pd.read_csv('Popular_Baby_Names.csv')

In [6]:
names.tail()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
11340,2016,FEMALE,BLACK NON HISPANIC,Saniyah,10,43
11341,2016,FEMALE,BLACK NON HISPANIC,Skye,10,43
11342,2016,FEMALE,BLACK NON HISPANIC,Tiana,10,43
11343,2016,FEMALE,BLACK NON HISPANIC,Violet,10,43
11344,2016,FEMALE,BLACK NON HISPANIC,Zahra,10,43


In [7]:
import time

start_time = time.time()

# Replace all the entries that has 'FEMALE' as a gender with 'GIRL'
names['Gender'].loc[names.Gender == 'FEMALE'] = 'GIRL'

print("Time using .loc[]: {} sec".format(time.time() - start_time))

Time using .loc[]: 0.01046442985534668 sec


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [8]:
names.tail()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
11340,2016,GIRL,BLACK NON HISPANIC,Saniyah,10,43
11341,2016,GIRL,BLACK NON HISPANIC,Skye,10,43
11342,2016,GIRL,BLACK NON HISPANIC,Tiana,10,43
11343,2016,GIRL,BLACK NON HISPANIC,Violet,10,43
11344,2016,GIRL,BLACK NON HISPANIC,Zahra,10,43


In [9]:
names = pd.read_csv('Popular_Baby_Names.csv')

In [10]:
start_time = time.time()

# Replace all the entries that has 'FEMALE' as a gender with 'GIRL'
names['Gender'].replace('FEMALE', 'GIRL', inplace=True)

print("Time using .replace(): {} sec".format(time.time() - start_time))

Time using .replace(): 0.0018913745880126953 sec


In [11]:
names.tail()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
11340,2016,GIRL,BLACK NON HISPANIC,Saniyah,10,43
11341,2016,GIRL,BLACK NON HISPANIC,Skye,10,43
11342,2016,GIRL,BLACK NON HISPANIC,Tiana,10,43
11343,2016,GIRL,BLACK NON HISPANIC,Violet,10,43
11344,2016,GIRL,BLACK NON HISPANIC,Zahra,10,43


## **Replace multiple values I**


In this exercise, you will apply the .replace() function for the task of replacing multiple values with one or more values. You will again use the names dataset which contains, among others, the most popular names in the US by year, gender and Ethnicity.

Thus you want to replace all ethnicities classified as black or white non-hispanics to non-hispanic. Remember, the ethnicities are stated in the dataset as follows: ['BLACK NON HISP', 'BLACK NON HISPANIC', 'WHITE NON HISP' , 'WHITE NON HISPANIC'] and should be replaced to 'NON HISPANIC

In [19]:
import time as time
start_time = time.time()

# Replace all non-Hispanic ethnicities with 'NON HISPANIC'
names['Ethnicity'].loc[(names["Ethnicity"] == 'BLACK NON HISP') | 
                      (names["Ethnicity"] == 'BLACK NON HISPANIC') | 
                      (names["Ethnicity"] == 'WHITE NON HISP') | 
                      (names["Ethnicity"] == 'WHITE NON HISPANIC')] = 'NON HISPANIC'

print("Time using .loc[]: {} sec".format(time.time() - start_time))

Time using .loc[]: 0.00852060317993164 sec


In [20]:
start_time = time.time()

# Replace all non-Hispanic ethnicities with 'NON HISPANIC'
names['Ethnicity'].replace(['BLACK NON HISP', 'BLACK NON HISPANIC', 'WHITE NON HISP' , 'WHITE NON HISPANIC'], 'NON HISPANIC', inplace=True)

print("Time using .replace(): {} sec".format(time.time() - start_time))

Time using .replace(): 0.007716655731201172 sec


## **Replace multiple values II**


As discussed in the video, instead of using the .replace() function multiple times to replace multiple values, you can use lists to map the elements you want to replace one to one with those you want to replace them with.

As you have seen in our popular names dataset, there are two names for the same ethnicity. We want to standardize the naming of each ethnicity by replacing

* 'ASIAN AND PACI' to 'ASIAN AND PACIFIC ISLANDER'
* 'BLACK NON HISP' to 'BLACK NON HISPANIC'
* 'WHITE NON HISP' to 'WHITE NON HISPANIC'

In the DataFrame names, you are going to replace all the values on the left by the values on the right.

In [22]:
start_time = time.time()

# Replace ethnicities as instructed
names['Ethnicity'].replace(['ASIAN AND PACI','BLACK NON HISP', 'WHITE NON HISP'], ['ASIAN AND PACIFIC ISLANDER','BLACK NON HISPANIC','WHITE NON HISPANIC'], inplace=True)

print("Time using .replace(): {} sec".format(time.time() - start_time))

Time using .replace(): 0.005660533905029297 sec


## **Replace single values I**

In [24]:
names = pd.read_csv('Popular_Baby_Names.csv')

In [25]:
names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,FEMALE,ASIAN AND PACIFIC ISLANDER,SOPHIA,119,1
1,2011,FEMALE,ASIAN AND PACIFIC ISLANDER,CHLOE,106,2
2,2011,FEMALE,ASIAN AND PACIFIC ISLANDER,EMILY,93,3
3,2011,FEMALE,ASIAN AND PACIFIC ISLANDER,OLIVIA,89,4
4,2011,FEMALE,ASIAN AND PACIFIC ISLANDER,EMMA,75,5


In [29]:
names['Ethnicity'].replace({'BLACK NON HISPANIC':'NON HISP', 'WHITE NON HISPANIC':'NON HISP'}, inplace=True)
print(names['Ethnicity'].head(200))

0      ASIAN AND PACIFIC ISLANDER
1      ASIAN AND PACIFIC ISLANDER
2      ASIAN AND PACIFIC ISLANDER
3      ASIAN AND PACIFIC ISLANDER
4      ASIAN AND PACIFIC ISLANDER
                  ...            
195                      NON HISP
196                      NON HISP
197                      NON HISP
198                      NON HISP
199                      NON HISP
Name: Ethnicity, Length: 200, dtype: object


## **Replace single values II**


For this exercise, we will be using the names DataFrame. In this dataset, the column 'Rank' shows the ranking of each name by year. For this exercise, you will use dictionaries to replace the first ranked name of every year as 'FIRST', the second name as 'SECOND' and the third name as 'THIRD'.

You will use dictionaries to replace one single value per key.

You can already see the first 5 names of the data, which correspond to the 5 most popular names for all the females belonging to the 'ASIAN AND PACIFIC ISLANDER' ethnicity in 2011.

In [30]:
# Replace the number rank by a string
names['Rank'].replace({1:'FIRST', 2:'SECOND', 3:'THIRD'}, inplace=True)
print(names.head())

   Year of Birth  Gender                   Ethnicity Child's First Name  \
0           2011  FEMALE  ASIAN AND PACIFIC ISLANDER             SOPHIA   
1           2011  FEMALE  ASIAN AND PACIFIC ISLANDER              CHLOE   
2           2011  FEMALE  ASIAN AND PACIFIC ISLANDER              EMILY   
3           2011  FEMALE  ASIAN AND PACIFIC ISLANDER             OLIVIA   
4           2011  FEMALE  ASIAN AND PACIFIC ISLANDER               EMMA   

   Count    Rank  
0    119   FIRST  
1    106  SECOND  
2     93   THIRD  
3     89       4  
4     75       5  


## **Replace multiple values III**


As you saw in the video, you can use dictionaries to replace multiple values with just one value, even from multiple columns. To show the usefulness of replacing with dictionaries, you will use the names dataset one more time.

In this dataset, the column 'Rank' shows which rank each name reached every year. You will change the rank of the first three ranked names of every year to 'MEDAL' and those from 4th and 5th place to 'ALMOST MEDAL'.

You can already see the first 5 names of the data, which correspond to the 5 most popular names for all the females belonging to the 'ASIAN AND PACIFIC ISLANDER' ethnicity in 2011.

In [31]:
# Replace the rank of the first three ranked names to 'MEDAL'
names.replace({'Rank': {1:'MEDAL', 2:'MEDAL', 3:'MEDAL'}}, inplace=True)

# Replace the rank of the 4th and 5th ranked names to 'ALMOST MEDAL'
names.replace({'Rank': {4:'ALMOST MEDAL', 5:'ALMOST MEDAL'}}, inplace=True)
print(names.head())

   Year of Birth  Gender                   Ethnicity Child's First Name  \
0           2011  FEMALE  ASIAN AND PACIFIC ISLANDER             SOPHIA   
1           2011  FEMALE  ASIAN AND PACIFIC ISLANDER              CHLOE   
2           2011  FEMALE  ASIAN AND PACIFIC ISLANDER              EMILY   
3           2011  FEMALE  ASIAN AND PACIFIC ISLANDER             OLIVIA   
4           2011  FEMALE  ASIAN AND PACIFIC ISLANDER               EMMA   

   Count          Rank  
0    119         FIRST  
1    106        SECOND  
2     93         THIRD  
3     89  ALMOST MEDAL  
4     75  ALMOST MEDAL  
