<a href="https://colab.research.google.com/github/werowe/HypatiaAcademy/blob/master/pandas/2024_04_23_pandas_map.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Apply and Map Operation

* **apply** means to run a function over ever row or column in a series or dataframe

* **map** means to run a function over every row in a series. Unlike apply, map does not work on a dataframe

* **axis** means 1 to apply to each row or 0 to apply to each each column

> **InPlace Behavior**
>
> not that apply and map do not require the **inplace** argument it updates the series or dataframe inplace


In [144]:
# make a series

import numpy as np
import pandas as pd


def makedata():

  df = pd.DataFrame({
       "studentId": np.random.randint(1000,2000,5),
       "name": np.array(["a", "b", "c", "d", "e"])
    })

  df.set_index("studentId",inplace=True)

  return df

df=makedata()
df

Unnamed: 0_level_0,name
studentId,Unnamed: 1_level_1
1408,a
1544,b
1006,c
1306,d
1116,e


In [145]:
# writing functions.  the first is for a series.  the second if for a dataframe


# This is our series function
# here we send in a single value.  that is what a series is.  a single value
# one thing is called a primitive or a scalar or a single value (as opposed to some object)
def upperSeries(str):

  return str.upper()

# this is our Row function, meaning one that works with the whole dataframe
# here we send in rows
# a row is a dataframe row.  so it's columns of data.  but here we only have one column
def lowerRow(row):

  row['name']=row['name'].lower()

  return row


def upperRow(row):

  row['name']=row['name'].upper()

  return row

upperSeries("tamara")

'TAMARA'

In [146]:
df

Unnamed: 0_level_0,name
studentId,Unnamed: 1_level_1
1408,a
1544,b
1006,c
1306,d
1116,e


In [147]:

# axis = 1 means by column, axis = means by row
df.apply(upperRow,axis=1)


Unnamed: 0_level_0,name
studentId,Unnamed: 1_level_1
1408,A
1544,B
1006,C
1306,D
1116,E


In [148]:
df

Unnamed: 0_level_0,name
studentId,Unnamed: 1_level_1
1408,A
1544,B
1006,C
1306,D
1116,E


In [149]:
# Here we show that the apply function does indeed update the dataframe inplace.
# thus there is no inplace argument and no need to assign the dataframe back onto itself


df.apply(lowerRow,axis=1)
df

Unnamed: 0_level_0,name
studentId,Unnamed: 1_level_1
1408,a
1544,b
1006,c
1306,d
1116,e


In [150]:
df['name'].map(upperSeries)


studentId
1408    A
1544    B
1006    C
1306    D
1116    E
Name: name, dtype: object

In [151]:
# use apply instead of map

df['name'].apply(upperSeries)


studentId
1408    A
1544    B
1006    C
1306    D
1116    E
Name: name, dtype: object

In [152]:
# instead of using def you can use lambda
# means an in-line function.  also known as autonomous

upperL = lambda x : x.upper()

def upperD(s):
  return s.upper()

upperL("tamara")


'TAMARA'

In [153]:
# but the whole point of lambda is to put it in-line, to make the code more compact

df['name'].apply(lambda x : x.upper())


studentId
1408    A
1544    B
1006    C
1306    D
1116    E
Name: name, dtype: object

# Axis Explained

The easier way to understand axes is to make a dataframe of all number and sum them in both directions:  (1) by row and (2) by column

In [154]:
# sum across row

df = pd.DataFrame({

      "a" : [1,2,3,4],
      "b" : [5,6,7,8]
})

df.sum(axis=0)




a    10
b    26
dtype: int64

In [155]:
# sum across column

df.sum(axis=1)

0     6
1     8
2    10
3    12
dtype: int64