In [1]:
import pandas as pd
import matplotlib
path_data = '../../../assets/data/'
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import numpy as np

# Applying a Function to a Column

In data science, we often need to perform complex transformations and calculations on our data. So far we have seen some examples of creating new columns of tables by applying functions to existing columns or to other arrays. All of those functions took arrays as their arguments. But frequently we will want to convert the entries in a column by a function that doesn't take an array as its argument. 


For example, it might take just one number as its argument, as in the function `categorize_score` defined below.

In [11]:
#the function to categorize scores
def categorize_score(score):
    if score >= 90:
        return 'A'
    elif score >= 80:
        return 'B'
    elif score >= 70:
        return 'C'
    elif score >= 60:
        return 'D'
    else:
        return 'F'

In [12]:
categorize_score(17)

'F'

In [13]:
categorize_score(97)

'A'

In [14]:
categorize_score(76)

'C'

The function `categorize_score` simply returns letter grade given a score. Let's consider a Dataframe containing information about students and their scores in a particular exam. To use this function on many scores at once, we will have to be able to *refer* to the function itself, without actually calling it. Analogously, we might show a cake recipe to a chef and ask her to use it to bake 6 cakes.  In that scenario, we are not using the recipe to bake any cakes ourselves; our role is merely to refer the chef to the recipe.  Similarly, we can ask a table to call `categorize_score` on 6 different numbers in a column.

First, we create the dataframe `students` with a column for Name and Score. 

In [15]:
students = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Score': [85, 92, 78, 64]
})
students

Unnamed: 0,Name,Score
0,Alice,85
1,Bob,92
2,Charlie,78
3,David,64


## `apply`

To convert each of the scores to it's letter grade, we will use a new Table method. The `apply` method calls a function on each element of a column, forming a new array of return values. To indicate which function to call, just name it (without quotation marks or parentheses). The name of the column of input values is a string that must still appear within quotation marks.

In [19]:
students['Score'].apply(categorize_score)

0    B
1    A
2    C
3    D
Name: Score, dtype: object

What we have done here is `apply` the function `categorize_score` to each value in the `Score` column of the table `students`. The output is the array of corresponding return values of the function. For example, 85 became 'B', 92 became 'A' and so on.

This array, which has the same length as the original `Score` column of the `students` table, can be used as the values in a new column called `Letter Grad` alongside the existing `Name` and `Score` columns.

In [20]:
students['Letter Grade'] = students['Score'].apply(categorize_score)
students

Unnamed: 0,Name,Score,Letter Grade
0,Alice,85,B
1,Bob,92,A
2,Charlie,78,C
3,David,64,D


## Functions as Values
We've seen that Python has many kinds of values.  For example, `6` is a number value, `"cake"` is a text value, `Table()` is an empty table, and `ages` is a name for a table value (since we defined it above).

In Python, every function, including `categorize_score`, is also a value. It helps to think about recipes again. A recipe for cake is a real thing, distinct from cakes or ingredients, and you can give it a name like "Ani's cake recipe." When we defined `categorize_score` with a `def` statement, we actually did two separate things: we created a function that converts a score to letter grade, and we gave it the name `categorize_score`.

We can refer to any function by writing its name, without the parentheses or arguments necessary to actually call it. We did this when we called `apply` above.  When we write a function's name by itself as the last line in a cell, Python produces a text representation of the function, just like it would print out a number or a string value.

In [15]:
categorize_score

<function __main__.categorize_score(score)>

Notice that we did not write `"categorize_score"` with quotes (which is just a piece of text), or `categorize_score()` (which is a function call, and an invalid one at that).  We simply wrote `categorize_score` to refer to the function.

Just like we can define new names for other values, we can define new names for functions.  For example, suppose we want to refer to our function as `categorize` instead of `categorize_score`.  We can just write this:

In [16]:
categorize = categorize_score

Now `categorize` is a name for a function.  It's the same function as `categorize_score`, so the printed value is exactly the same.

In [17]:
categorize

<function __main__.categorize_score(score)>

Let us see another application of `apply`.