In [1]:
import pandas as pd

# Applying a function to a DataFrame or Series

In this notebook, we’ll explore three key pandas methods: `map`, `apply`, and `applymap`.
These methods allow you to apply functions to your data, but they behave differently depending on whether they are used on a Series or a DataFrame. 

**Table of contents:**

- [The map method](#1.-The-map-method)
- [The apply method](#2.-The-apply-method)

For our first examples, we’ll use the Titanic dataset, which contains information about the passengers on the Titanic, including their age, gender, class, and whether they survived.

In [2]:
path = "https://raw.githubusercontent.com/um-perez-alvaro/Data-Science-Practice/master/Data/titanic.csv"
df = pd.read_csv(path)
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## 1. The map method

The `map` method works as both a DataFrame and Series method.

- [Map as a Series method](#1.1.-Map-as-a-Series-method)
- [Map as a DataFrame method](#1.2.-Map-as-a-DataFrame-method)

## 1.1. Map as a Series method

When applied to a Series, the `.map` method lets you transform its values to a different set.
It accepts dictionaries, Series, or functions to perform these transformations.

Currently, the sex of each passenger is represented as a string: "male" or "female."

In [4]:
df.Sex.unique()

array(['male', 'female'], dtype=object)

Some Machine Learning methods we’ll cover later require numerical data, so let’s encode the passenger’s sex with integers: 1 for females and 0 for males.

In [7]:
# Create a mapping dictionary to convert gender to numerical values
my_map = {'female' : 1, 'male' : 0}
my_map

{'female': 1, 'male': 0}

In [8]:
# map 'female' to 1 and 'male' to 0
df.Sex.map(my_map)

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    1
889    0
890    0
Name: Sex, Length: 891, dtype: int64

In [9]:
# Alternatively, create a function to map gender to numerical values
def my_map_fun(sex):
    if sex=='female':
        return 1
    else:
        return 0

In [10]:
df.Sex.map(my_map_fun)

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    1
889    0
890    0
Name: Sex, Length: 891, dtype: int64

In [11]:
# For those familiar with lambda functions, here’s an alternative
df.Sex.map(lambda sex: 1 if sex == 'female' else 0)

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    1
889    0
890    0
Name: Sex, Length: 891, dtype: int64

## 1.2. Map as a DataFrame method

When applied to a DataFrame, the `map` method applies a function to each element in the DataFrame.

Let’s say we want to compute the length of every string in the DataFrame. It’s not something we’d usually do, but it’s a good example to illustrate the process. :)

In [14]:
df.head(1)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S


In [29]:
# Select the columns with string data (e.g., 'Name', 'Ticket') 
string_columns = ['Name', 'Ticket']  
df[string_columns]

Unnamed: 0,Name,Ticket
0,"Braund, Mr. Owen Harris",A/5 21171
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",PC 17599
2,"Heikkinen, Miss. Laina",STON/O2. 3101282
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",113803
4,"Allen, Mr. William Henry",373450
...,...,...
886,"Montvila, Rev. Juozas",211536
887,"Graham, Miss. Margaret Edith",112053
888,"Johnston, Miss. Catherine Helen ""Carrie""",W./C. 6607
889,"Behr, Mr. Karl Howell",111369


In [30]:
# Verify that all selected columns contain 'object' data types (i.e., strings)
df[string_columns].dtypes

Name      object
Ticket    object
dtype: object

In [31]:
df[string_columns]

Unnamed: 0,Name,Ticket
0,"Braund, Mr. Owen Harris",A/5 21171
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",PC 17599
2,"Heikkinen, Miss. Laina",STON/O2. 3101282
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",113803
4,"Allen, Mr. William Henry",373450
...,...,...
886,"Montvila, Rev. Juozas",211536
887,"Graham, Miss. Margaret Edith",112053
888,"Johnston, Miss. Catherine Helen ""Carrie""",W./C. 6607
889,"Behr, Mr. Karl Howell",111369


In [32]:
# Select the columns with string data and compute the length of each string
df[string_columns].map(len)

Unnamed: 0,Name,Ticket
0,23,9
1,51,8
2,22,16
3,44,6
4,24,6
...,...,...
886,21,6
887,28,6
888,40,10
889,21,6


## 2. The apply method

`.apply` is both a Series method and a DataFrame method

- [Apply as a Series method](#2.1.-Apply-as-a-Series-method)
- [Apply as a DataFrame method](#2.2.-Apply-as-a-DataFrame-method)

### 2.1. Apply as a Series method

`.apply` applies a function to each element of the Series

**Example 1:** calculate the length of the strings in the `Name` column

In [33]:
# Python 'len' (length) function
len('Javier')

6

In [35]:
# apply Python 'len' function
df['Name_length'] = df.Name.apply(len) 

In [36]:
df[['Name','Name_length']]

Unnamed: 0,Name,Name_length
0,"Braund, Mr. Owen Harris",23
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",51
2,"Heikkinen, Miss. Laina",22
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",44
4,"Allen, Mr. William Henry",24
...,...,...
886,"Montvila, Rev. Juozas",21
887,"Graham, Miss. Margaret Edith",28
888,"Johnston, Miss. Catherine Helen ""Carrie""",40
889,"Behr, Mr. Karl Howell",21


In [37]:
# the map method also works 
df.Name.map(len)

0      23
1      51
2      22
3      44
4      24
       ..
886    21
887    28
888    40
889    21
890    19
Name: Name, Length: 891, dtype: int64

**Example 2:** round up each element in the 'Fare' column to the next integer

In [38]:
# import numpy
import numpy as np

In [39]:
# numpy 'ceil' function
np.ceil(2.3)

3.0

In [40]:
df['Fare_ceil'] = df.Fare.apply(np.ceil) # apply Numpy ceiling function

In [41]:
df[['Fare','Fare_ceil']]

Unnamed: 0,Fare,Fare_ceil
0,7.2500,8.0
1,71.2833,72.0
2,7.9250,8.0
3,53.1000,54.0
4,8.0500,9.0
...,...,...
886,13.0000,13.0
887,30.0000,30.0
888,23.4500,24.0
889,30.0000,30.0


**Example 3:** Extract the last name of each person into its own column

In [43]:
# here is a name
df.Name[0]

'Braund, Mr. Owen Harris'

In [44]:
# Split the first name in the 'Name' column by the comma and pick the first element
df.Name[0].split(',')[0]

'Braund'

In [45]:
# Create a function to extract the last name from a full name
def get_last_name(name):
    return name.split(',')[0]

In [21]:
# Check that the get_last_name function works
get_last_name(titanic.Name[0])

'Braund'

In [48]:
# Apply the get_last_name function to every name in the 'Name' column and store the result in a new 'Last_name' column
df['Last_name'] = df.Name.apply(get_last_name)
df[['Name','Last_name']]

Unnamed: 0,Name,Last_name
0,"Braund, Mr. Owen Harris",Braund
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",Cumings
2,"Heikkinen, Miss. Laina",Heikkinen
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",Futrelle
4,"Allen, Mr. William Henry",Allen
...,...,...
886,"Montvila, Rev. Juozas",Montvila
887,"Graham, Miss. Margaret Edith",Graham
888,"Johnston, Miss. Catherine Helen ""Carrie""",Johnston
889,"Behr, Mr. Karl Howell",Behr


In [50]:
# alternatively, use a lambda function
df.Name.apply(lambda x:x.split(',')[0])

0         Braund
1        Cumings
2      Heikkinen
3       Futrelle
4          Allen
         ...    
886     Montvila
887       Graham
888     Johnston
889         Behr
890       Dooley
Name: Name, Length: 891, dtype: object

In [51]:
# Note that the map method can also be used to achieve the same result
df.Name.map(get_last_name)

0         Braund
1        Cumings
2      Heikkinen
3       Futrelle
4          Allen
         ...    
886     Montvila
887       Graham
888     Johnston
889         Behr
890       Dooley
Name: Name, Length: 891, dtype: object

<div class="admonition note alert alert-info">
<p class="first admonition-title" style="font-weight: bold;">Note</p>
    <p> The <tt>map</tt> method can often be used in place of <tt>apply</tt>, but <tt>apply</tt> is more flexible and is generally recommended for broader use cases.</p>
</div>

## 2.2. Apply as a DataFrame method

The ``apply`` method allows you to apply a function along either axis of the DataFrame.

For the final examples, we’ll use the drinks dataset, which contains alcohol consumption data for various countries.

In [52]:
# read a dataset of alcohol consumption into a DataFrame
path = 'https://raw.githubusercontent.com/um-perez-alvaro/Data-Science-Practice/master/Data/drinks.csv'
df = pd.read_csv(path, index_col='country')
df

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,0,0,0,0.0,Asia
Albania,89,132,54,4.9,Europe
Algeria,25,0,14,0.7,Africa
Andorra,245,138,312,12.4,Europe
Angola,217,57,45,5.9,Africa
...,...,...,...,...,...
Venezuela,333,100,3,7.7,South America
Vietnam,111,2,1,2.0,Asia
Yemen,6,0,0,0.1,Asia
Zambia,32,19,4,2.5,Africa


In [53]:
# Drop the 'continent' and 'total_litres_of_pure_alcohol' columns as they won't be needed in the following examples
df.drop(['continent','total_litres_of_pure_alcohol'], axis=1, inplace=True)
df.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,0,0,0
Albania,89,132,54
Algeria,25,0,14
Andorra,245,138,312
Angola,217,57,45


**Example 1:** Apply the `max` function along columns (axis 0) to calculate the maximum value in each column.

In [55]:
df.apply(max, axis=0)

beer_servings      376
spirit_servings    438
wine_servings      370
dtype: int64

In this case, we don’t actually need `apply` since pandas already has a built-in `max` function for DataFrames that can directly calculate the maximum values.

In [59]:
# alternatively
df.max(axis=0)

beer_servings      376
spirit_servings    438
wine_servings      370
dtype: int64

**Example 2:** Apply the `max` function along rows (axis 1) to calculate the maximum value in each row.

In [60]:
df.apply(max,axis=1)

country
Afghanistan      0
Albania        132
Algeria         25
Andorra        312
Angola         217
              ... 
Venezuela      333
Vietnam        111
Yemen            6
Zambia          32
Zimbabwe        64
Length: 193, dtype: int64

In [61]:
# alternatively
df.max(axis=1)

country
Afghanistan      0
Albania        132
Algeria         25
Andorra        312
Angola         217
              ... 
Venezuela      333
Vietnam        111
Yemen            6
Zambia          32
Zimbabwe        64
Length: 193, dtype: int64

Let’s check the maximum value for the USA.

In [62]:
df.max(axis=1)['USA'] 

249

Is this value for beer, spirits, or wine?

**Example 3:** use `np.argmax`  to determine which column has the maximum value for each row.

In [63]:
df.apply(np.argmax,axis=1)

country
Afghanistan    0
Albania        1
Algeria        0
Andorra        2
Angola         0
              ..
Venezuela      0
Vietnam        0
Yemen          0
Zambia         0
Zimbabwe       0
Length: 193, dtype: int64

Now that we know which column contains the maximum value for each row, we can translate the integers 0, 1, and 2 into the corresponding categories: beer, spirits, and wine.

In [65]:
df.columns

Index(['beer_servings', 'spirit_servings', 'wine_servings'], dtype='object')

In [67]:
# Identify the column with the maximum value for each row and map the result to the corresponding column name
df.apply(np.argmax,axis=1).map({0:'beer_servings',
                                1:'spirit_servings',
                                2:'wine_servings'})

country
Afghanistan      beer_servings
Albania        spirit_servings
Algeria          beer_servings
Andorra          wine_servings
Angola           beer_servings
                    ...       
Venezuela        beer_servings
Vietnam          beer_servings
Yemen            beer_servings
Zambia           beer_servings
Zimbabwe         beer_servings
Length: 193, dtype: object

In [68]:
df.apply(np.argmax,axis=1).map({0:'beer_servings',
                                1:'spirit_servings',
                                2:'wine_servings'})['USA']

'beer_servings'