In [2]:
import pandas as pd

# Applying a function to a DataFrame or Series

**Table of contents:**

- [The map method](#1.-The-map-method)
- [The apply method](#2.-The-apply-method)
- [The applymap method](#3.-The-applymap-method)

In [3]:
url = "https://raw.githubusercontent.com/um-perez-alvaro/Data-Science-Practice/master/Data/titanic.csv"
titanic = pd.read_csv(url)
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## 1. The map method

`.map` is a Series method. 
It allows you to map an existing value of a series to a different set of values.

In [4]:
my_map = {'female':1,'male':0}
my_map

{'female': 1, 'male': 0}

In [5]:
titanic.Sex

0        male
1      female
2      female
3      female
4        male
        ...  
886      male
887    female
888    female
889      male
890      male
Name: Sex, Length: 891, dtype: object

In [8]:
# map 'female' to 1 and 'male' to 0
titanic.Sex.map(my_map)

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    1
889    0
890    0
Name: Sex, Length: 891, dtype: int64

In [9]:
# alternatively, use a function
def my_map_fun(sex):
    if sex=='female':
        return 1
    else:
        return 0

In [10]:
titanic.Sex.map(my_map_fun)

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    1
889    0
890    0
Name: Sex, Length: 891, dtype: int64

## 2. The apply method

`.apply` is both a Series method and a DataFrame method

- [Apply as a Series method](#2.1.-Apply-as-a-Series-method)
- [Apply as a DataFrame method](#2.2.-Apply-as-a-DataFrame-method)

### 2.1. Apply as a Series method

`.apply` applies a function to each element of the Series

**Example 1:** calculate the length of the strings in the 'Name' column

In [11]:
# Python 'len' (length) function
len('Javier')

6

In [12]:
titanic['Name_length'] = titanic.Name.apply(len) # apply Python 'len' function

In [13]:
titanic[['Name','Name_length']]

Unnamed: 0,Name,Name_length
0,"Braund, Mr. Owen Harris",23
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",51
2,"Heikkinen, Miss. Laina",22
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",44
4,"Allen, Mr. William Henry",24
...,...,...
886,"Montvila, Rev. Juozas",21
887,"Graham, Miss. Margaret Edith",28
888,"Johnston, Miss. Catherine Helen ""Carrie""",40
889,"Behr, Mr. Karl Howell",21


In [14]:
# the map method also works 
titanic.Name.map(len)

0      23
1      51
2      22
3      44
4      24
       ..
886    21
887    28
888    40
889    21
890    19
Name: Name, Length: 891, dtype: int64

**Example 2:** round up each element in the 'Fare' column to the next integer

In [15]:
# import numpy
import numpy as np

In [16]:
# numpy 'ceil' function
np.ceil(2.3)

3.0

In [17]:
titanic['Fare_ceil'] = titanic.Fare.apply(np.ceil) # apply Numpy ceiling function

In [18]:
titanic[['Fare','Fare_ceil']]

Unnamed: 0,Fare,Fare_ceil
0,7.2500,8.0
1,71.2833,72.0
2,7.9250,8.0
3,53.1000,54.0
4,8.0500,9.0
...,...,...
886,13.0000,13.0
887,30.0000,30.0
888,23.4500,24.0
889,30.0000,30.0


**Example 3:** Extract the last name of each person into its own column

In [19]:
titanic.Name[0].split(',')[0]

'Braund'

In [20]:
def get_last_name(name):
    return name.split(',')[0]

In [21]:
get_last_name(titanic.Name[0])

'Braund'

In [22]:
titanic['Last_name'] = titanic.Name.apply(get_last_name)

In [23]:
titanic[['Name','Last_name']]

Unnamed: 0,Name,Last_name
0,"Braund, Mr. Owen Harris",Braund
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",Cumings
2,"Heikkinen, Miss. Laina",Heikkinen
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",Futrelle
4,"Allen, Mr. William Henry",Allen
...,...,...
886,"Montvila, Rev. Juozas",Montvila
887,"Graham, Miss. Margaret Edith",Graham
888,"Johnston, Miss. Catherine Helen ""Carrie""",Johnston
889,"Behr, Mr. Karl Howell",Behr


In [24]:
# alternatively, use a lambda function
titanic['Last_name'] = titanic.Name.apply(lambda x:x.split(',')[0])

<div class="admonition note alert alert-info">
<p class="first admonition-title" style="font-weight: bold;">Note</p>
    <p> <tt>map</tt> can be substituted for <tt>apply</tt> in many cases, but <tt>apply</tt> is more flexible and thus is recommended.</p>
</div>

## 2.2. Apply as a DataFrame method

``apply`` applies a function along either axis of the DataFrame

In [25]:
# read a dataset of alcohol consumption into a DataFrame
url = 'https://raw.githubusercontent.com/um-perez-alvaro/Data-Science-Practice/master/Data/drinks.csv'
drinks = pd.read_csv(url, index_col='country')
drinks.drop(['continent','total_litres_of_pure_alcohol'], axis=1, inplace=True)
drinks.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,0,0,0
Albania,89,132,54
Algeria,25,0,14
Andorra,245,138,312
Angola,217,57,45


**Example 1:** apply the ``max`` function along axis 0 to calculate the maximum value in each column

In [26]:
drinks.apply(max, axis=0) # 'max' function along rows

beer_servings      376
spirit_servings    438
wine_servings      370
dtype: int64

In [27]:
# alternatively
drinks.max(axis=0)

beer_servings      376
spirit_servings    438
wine_servings      370
dtype: int64

**Example 2:** apply the `max` function along axis 1 to calculate the maximum value in each row

In [28]:
drinks.apply(max,axis=1)

country
Afghanistan      0
Albania        132
Algeria         25
Andorra        312
Angola         217
              ... 
Venezuela      333
Vietnam        111
Yemen            6
Zambia          32
Zimbabwe        64
Length: 193, dtype: int64

In [29]:
# alternatively
drinks.max(axis=1)

country
Afghanistan      0
Albania        132
Algeria         25
Andorra        312
Angola         217
              ... 
Venezuela      333
Vietnam        111
Yemen            6
Zambia          32
Zimbabwe        64
Length: 193, dtype: int64

In [30]:
drinks.max(axis=1)['USA'] # is beer, spirits or wine?

249

**Example 3:** use `np.argmax` to calculate which column has the maximum value for each row

In [31]:
drinks.apply(np.argmax,axis=1)

country
Afghanistan    0
Albania        1
Algeria        0
Andorra        2
Angola         0
              ..
Venezuela      0
Vietnam        0
Yemen          0
Zambia         0
Zimbabwe       0
Length: 193, dtype: int64

In [32]:
drinks.apply(np.argmax,axis=1)['Spain']

0

In [33]:
# if you want to know in which column is the maximum
drinks.apply(np.argmax,axis=1).map({0:'beer_servings',
                                        1:'spirit_servings',
                                        2:'wine_servings'})

country
Afghanistan      beer_servings
Albania        spirit_servings
Algeria          beer_servings
Andorra          wine_servings
Angola           beer_servings
                    ...       
Venezuela        beer_servings
Vietnam          beer_servings
Yemen            beer_servings
Zambia           beer_servings
Zimbabwe         beer_servings
Length: 193, dtype: object

In [34]:
drinks.apply(np.argmax,axis=1).map({0:'beer_servings',
                                        1:'spirit_servings',
                                        2:'wine_servings'})['Spain']

'beer_servings'

## 3. The applymap method

`applymap` is a DataFrame method. It applies a function to every element of the DataFrame

In [35]:
drinks.head(5)

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,0,0,0
Albania,89,132,54
Algeria,25,0,14
Andorra,245,138,312
Angola,217,57,45


In [38]:
# convert every DataFrame element into a float
silly_df = drinks.applymap(str)

In [40]:
silly_df.iloc[0,0]

'0'

In [41]:
drinks.iloc[0,0]

0