# Applying Functions to DataFrames

## Applied Review

### Functions

- Functions are Python codeblocks abstracted into a single name.

- Functions take *inputs* and produce *outputs*.

- Using functions well means
    - Naming your functions meaningfully
    - Creating docstrings for your functions
    - Commenting throughout -- as always!

- Functions allow their arguments to be passed in by *order* or by *name*.

- It's possible to create default values for function arguments using the `argument=<value>` syntax in the function definition.

## The `Series.apply` Method

Pandas Series objects have a method called `apply` that *applies* a function to elements of the Series.

Let's define a simple function that returns its input plus one.

In [1]:
import pandas as pd

In [2]:
def add_one(x):
    '''Adds 1 to the input.'''
    return x + 1

And then try applying it to a Series, `s`.

In [3]:
s = pd.Series([3, 2, 3, 9])
s

0    3
1    2
2    3
3    9
dtype: int64

In [4]:
s.apply(add_one)

0     4
1     3
2     4
3    10
dtype: int64

What happened? For each value in `s`, that value was passed into the `add_one` function and replaced with the result.

In [5]:
def make_it_a_sentence(x):
    string = 'The best number is ' + str(x)
    return string

In [6]:
s.apply(make_it_a_sentence)

0    The best number is 3
1    The best number is 2
2    The best number is 3
3    The best number is 9
dtype: object

We can apply any function that takes one argument and returns one result.

Of course, we may encounter errors if we expect the wrong kind of argument -- e.g. if we write a function for numbers but apply it to a Series of strings.

In [7]:
def add_one(x):
    '''Adds 1 to the input.'''
    return x + 1

In [8]:
s2 = pd.Series(['a', 'e', 'i', 'o', 'u'])
s2.apply(add_one)

TypeError: can only concatenate str (not "int") to str

Our functions can perform more complex logic.

For example, perhaps you want to store the sign of each element (positive, negative, or zero)

In [9]:
def sign(x):
    '''Reduce an input number to its sign (+, -, 0)'''
    if x > 0:
        return 1
    elif x < 0:
        return -1
    # In this case x must be equal to 0
    else:
        return 0

In [10]:
s3 = pd.Series([13, -83, -64, 0, 4, -34])

In [11]:
s3

0    13
1   -83
2   -64
3     0
4     4
5   -34
dtype: int64

In [12]:
s3.apply(sign)

0    1
1   -1
2   -1
3    0
4    1
5   -1
dtype: int64

<font class="your_turn">
Your Turn
</font>

1. Make a Series that is filled with the letters of your name. Enter the letters in lowercase.
2. Write a function that transforms an input letter to uppercase.
3. Apply your new function to the Series.

### Additional Arguments to `.apply`

What if the function we pass to `apply` requires multiple arguments?

For example, the built-in `pow` function requires two arguments.

In [13]:
pow(2, 3)

8

How would we raise all elements of our Series to the third power?

`apply` takes an argument called `args` for this purpose -- additional arguments can be passed into it, as a list.

In [14]:
s

0    3
1    2
2    3
3    9
dtype: int64

In [15]:
# Apply pow(x, 3) to each x in s
s.apply(pow, args=[3])

0     27
1      8
2     27
3    729
dtype: int64

This is essentially just a more concise version of:

In [16]:
def raise_to_3(x):
    return pow(x, 3)
s.apply(raise_to_3)

0     27
1      8
2     27
3    729
dtype: int64

## The `DataFrame.apply` method

Like Series, DataFrames have an `apply` method.

But in this case, `apply` applies a function to each **row** or **column** of the DataFrame, not each element.

Remember that DataFrame columns and rows are Series -- so the input to the function will be a Series!

### Applying Functions to Columns

In [17]:
def maximum(column):
    '''Calculates the maximum value in a column'''
    return column.max()

In [18]:
data = [(1, 7, -7), (2, 1, -4), (3, 5, 2), (6, 6, -1)]
df = pd.DataFrame(data, columns=['digit_one', 'digit_two', 'digit_three'])

In [19]:
df

Unnamed: 0,digit_one,digit_two,digit_three
0,1,7,-7
1,2,1,-4
2,3,5,2
3,6,6,-1


In [20]:
df.apply(maximum)

digit_one      6
digit_two      7
digit_three    2
dtype: int64

What happened here?

- The maximum function was applied to each column.

- The result of each column was a scalar (a single number).

- All the results were combined into a single Series.

### Applying Functions to Rows

The `apply` method is very useful for rows, because individal elements of the row can be accessed using bracket syntax.

By default, `apply` works on columns though -- but we can switch to rows with the `axis=1` argument.

In [21]:
def formula(row):
    '''Applies a custom formula `a + (b / c)`'''
    result = row['digit_one'] + row['digit_two'] / row['digit_three']
    return result

In [22]:
df

Unnamed: 0,digit_one,digit_two,digit_three
0,1,7,-7
1,2,1,-4
2,3,5,2
3,6,6,-1


In [23]:
df.apply(formula, axis=1)

0    0.00
1    1.75
2    5.50
3    0.00
dtype: float64

- The `formula` function was applied to each row and returned a scalar for each.

- Those scalars were combined into a Series, which is our result.

### Applying Functions that Return Series

Our last two examples have applied a function to a DataFrame to produce a Series.
But what if we wanted to get a DataFrame back instead?

If the applied function returns a Series, all of the resulting Series will be combined back into a DataFrame.

Let's take a look using an updated version of our `add_one` function.

In [24]:
def add_one(col):
    # Add one to each element of the input column
    new_col = col + 1
    # Return the updated column
    return new_col

In [25]:
df

Unnamed: 0,digit_one,digit_two,digit_three
0,1,7,-7
1,2,1,-4
2,3,5,2
3,6,6,-1


In [26]:
df.apply(add_one)

Unnamed: 0,digit_one,digit_two,digit_three
0,2,8,-6
1,3,2,-3
2,4,6,3
3,7,7,0


<font class="your_turn">
Your Turn
</font>

1. What two types of Pandas objects support the `apply` method? How are their respective `apply` methods different?
2. Create a Series with the values `[3, 1, -4, 4, -9]`. Apply the built-in Python function `abs` to your Series.
3. Given the below DataFrame, write and apply a custom function to it to create a Series that looks like this:

```
0   'Looking North means West is to your left',
1   'Looking East means North is to your left',
2   'Looking South means East is to your left',
3   'Looking West means South is to your left'
dtype: object
```

In [27]:
directions = [('North', 'West'),
              ('East', 'North'),
              ('South', 'East'),
              ('West', 'South')]
directions = pd.DataFrame(directions, columns=['facing', 'leftward'])

## Questions
Are there any questions before we move on?