# Functions!

Wednesday, June 14 2023

Notebook Author: Susanna Lange, PhD

<img src="https://github.com/SusannaLange/Data_118_images/blob/main/DSSI_images/function_machine.png?raw=true" width="800">


In [None]:
#import the basics!

import numpy as np
import pandas as pd

## Things we know

 DataFrames!
 
 - Merge
 
 - GroupBy 
 
 - pivot_table
 
 - Visualization
 
 - Conditionals and Iteration
 

## Goals: 

    
- User defined functions

- How to apply them to DataFrames

### <mark style="background:Thistle;color:black"> But first...Code Comprehension: Multiple Choice: </mark>


What the following code output?

```python
x = 0
a = 0
b = -5
if a > 0:
    if b < 0: 
        x = x + 5 
    elif a > 5:
        x = x + 4
    else:
        x = x + 3
else:
    x = x + 2
print(x)
```

A. 3

B. 0

C. 2

D. 4


## We have seen built-in functions

| Built-in Python Functions     | Description |
| ----------- | ----------- |
| print(...)      | Print function: Returns the output. |
| max(...)      | Maximum function: Returns the maximum of the given inputs. |
| min(...)   | Minimum function: Returns the maximum of the given inputs.  |
| abs(...)     | Absolute value function: Returns the absolute value of the given input. |
| round(...)   | Rounding function: Returns the rounded input. |
| len(...)   | Length function: Returns the length of given input. |
| type(...)   | Data Type function: Returns the datatype of the input. |

Recall...

There's a help function!

In [None]:
help(abs)

In [None]:
help(round)

We may want to repeat some process multiple times but there is no built-in function to rely on... 


## We can write our own functions!

Replacing multiple lines of code with a function allows seemless reuse of a process or computation.

The general format for defining a function is given below.

```python 

def function_name(input_arguments):
    """ Documentation on what your function does """
    
    body of function
    
    return output
```


- The `def` keyword indicates defining a function

- function_name: We can name our function however we please


- input_arguments: We decide how many values our function takes as input


- docstring: We document the key characteristics of our function by using a string - typically triple quotes


- body of fuction: We perform some computation in the indented body


- output: The output is returned


<img src="https://github.com/SusannaLange/Data_118_images/blob/main/DSSI_images/function_photo.png?raw=true" width="800">

Photo Source: Data8 Textbook

## Let's explore this function

In [None]:
def double(x):
    """Double the input x"""
    
    y = 2*x
    
    return y   

Note, we need the return statement, otherwise nothing will be returned!

The return statement

 
 - Immediately terminates the function 
 
 
 - Allows for the output to be stored
 

We haven't really computed anything yet, we've just **defined** the new function

We've defined this function with the intent of a numerical input...but we can input whatever we want.

In [None]:
double(3)

In [None]:
double(4.5)

In [None]:
double('data')

## We can even double an expression or an array

In [None]:
double(np.array([1,2,3]))

In [None]:
double(2*3)

### Assert

If we want to ensure a user inputs the correct (or expected) datatype into our function.


We could use `assert` statement. This doesn't check but rather asserts a condition is true.

Useful because it encodes in the error message.

In [None]:
def double(x):
    """Double the input x"""
    assert type(x) == int or type(x) == float
    y = 2*x
    
    return y 

In [None]:
double(5)

In [None]:
double(4.9)

In [None]:
double('data')

### Importance of docstring!



Documentation on what your function does 


It can contain: 

 - arguments
 
 
 - function’s purpose
 
 
 - information about return values

In [None]:
help(double)

## Return vs print Discussion

<mark style="background:Thistle;color:black"> What's the difference between print and return?! </mark>

Answer here

Consider the following two functions:

In [None]:
def convert_temp(fahrenheit):
    """This function converts fahrenheit to celsius"""
    
    celsius = 5/9*(fahrenheit-32)
    print("Hi, I'm in the function and celsius =", celsius)
    
    return np.round(celsius,decimals=2)

In [None]:
def convert_temp2(fahrenheit):
    """This function converts fahrenheit to celsius"""
    
    celsius = 5/9*(fahrenheit-32)
    print("Hi, I'm in the function and celsius =", celsius)
    
    print(np.round(celsius,decimals=2))

What is the difference between 'convert_temp' and 'convert_temp2'?
Let's investigate.

In [None]:
temp = convert_temp(68)

In [None]:
temp2 = convert_temp2(68)

In [None]:
print(temp)

In [None]:
print(temp2)

Answer here

**Important:** Variables defined inside function bodies are not visible outside the function





```python 
def convert_temp(fahrenheit):
    """This function converts fahrenheit to celsius"""
    
    celsius = 5/9*(fahrenheit-32)
    print("Hi, I'm in the function and celsius =", celsius)
    
    return np.round(celsius,decimals=2)
```

### <code style="background:Thistle;color:black"> Code Comprehension: What will happen if we execute the following code:</code>

```python 
def convert_temp(fahrenheit):
    """This function converts fahrenheit to celsius"""
    
    celsius = 5/9*(fahrenheit-32)
    print("Hi, I'm in the function and celsius =", celsius)
    
    return np.round(celsius,decimals=2)


celsius
```

Answer: ???

Where different variables are defined is called *scoping* in programming languages.

In general, you want to be thoughtful with your variable and function naming

In [None]:
#In fact, you can reuse the name outside of the function and it is a different variable
celsius = "Hi, I'm a string!"

print('The result of the function is:', convert_temp(68))
print()
print('The value of celsius outside the function is:', celsius)

### Now that we've defined our own function...

In [None]:
help(convert_temp)

We can call the help (built-in function) on any function we create!!! It gives us the docstring we wrote.

### Let's try another example. Here we write a function that takes two values as input (two arguments).

In [None]:
def register_class(Student_ID, class_name):
    """Takes Student_ID and class_name as input,
    Returns a message about registration"""
    
    message = "Thank you student with ID:" + str(Student_ID) + " for registering for " + str(class_name)
    
    return message

In [None]:
register_class(12345, 'Calculus')

In [None]:
register_class(12345, 'Introduction to Statistics')

### Note order does matter when calling a function!!

In [None]:
register_class('Calculus', 12345)

We can also call the function by being explicit about the arguments

In [None]:
register_class(Student_ID = 12345, class_name = "Calculus")

If we do this, we can switch around the order in the argument.

In [None]:
register_class(class_name = "Calculus", Student_ID = 12345)

## Making default arguments in our functions!

How do we do this for our own functions?

By assigning the variable in the function definition.

In [None]:
def register_class(Student_ID, class_name = 'Calculus'):
    """Takes Student_ID and class_name as input,
    Returns a message about registration"""
    
    message = "Thank you student with ID:" + str(Student_ID) + " for registering for " + class_name
    
    return message

Now we can call this function with one or two arguments.

In [None]:
register_class(12345)

In [None]:
register_class(12345, 'math')

Calling the help function on your function allows you to see the default settings

In [None]:
help(register_class)

### <mark style="background-color: Thistle"> Code comprehension - Multiple Choice</mark>

What happens when we run the following code:
    
```python 
def fun1(name, age=20):
    print(name, age)


fun1('Ellie', 25)

```

A:  Ellie 25

B:  Ellie 20

C:  nothing happens, there is no return statement

D:  Ellie, 25

E:  Ellie, 20

### <mark style="background-color: Thistle"> Code comprehension - Multiple Choice</mark>

. . .

What is the value of ```output``` when we run the following code:
    
```python 
def fun1(name, age=20):
    print(name, age)


output = fun1('Ellie', 25)


```

A:  Ellie 25

B:  Ellie 20

C:  None

### <mark style="background-color: Thistle">Working with Functions: Activity!</mark>

1. Write a function that takes a number and outputs if that number is even or odd

In [None]:
#code here

2. Write a function that takes a list as input of length n (assume n >= 2) and swaps the first and last elements.

In [None]:
#code here

3. (Optional - Challenge) Discuss in your groups:
 What does the following code output:
    
```python 

def outer_fun(a, b):
    def inner_fun(c, d):
        return c + d
    return inner_fun(a, b)

output = outer_fun(5, 10)
print(output)

```

## Now we can build our own functions, but how do we apply them to DataFrames?

We use the Affordable Housing data:

In [None]:
import pandas as pd

Housing_df = pd.read_csv("https://raw.githubusercontent.com/SusannaLange/Data_118_images/main/DSSI_images/Affordable_Rental_Housing_Developments.csv")
Housing_df.head(5)

<code style="background:Thistle;color:black"> Let's define a function that considers the 'Units' column, and returns 'yes' if greater than 50 units are available and 'no' if not. </code>

First step: Ignore the DataFrame...we want to do this for a single entry.

In [None]:
def over_50(units_numbers):
    return ???

Second step: We can now apply this function to each entry of a Dataframe column!

### Recall, there is a DataFrame method that applys a function to a column of your DataFrame.

It is the ```.apply()``` or ```.map()``` method.


 - ```.apply()```  useful when applying a function along an axis of the DataFrame or on Series.


 - ```.map()```  useful when substituting each value in a Series with another value.
 

You provide the function name and the column on which to apply the function

In [None]:
Housing_df.Units.apply(over_50)

This works too. Depending on preference.

In [None]:
Housing_df['Units'].apply(over_50)

Now you have a new column with new information. It is often good practice to retain the original data when you are modifying it for further analysis (so you don't lose data that may be useful in the future and you have a record of what you did)

So you can create a new column with this new information!

In [None]:
Housing_df['Over 50 units available'] = Housing_df.Units.apply(over_50)
Housing_df

With ```.apply()``` you can apply functions across columns of a Dataframe.


**Specify an axis along which the function is applied:**


0 or ‘index’: apply function to each column (Default).


1 or ‘columns’: apply function to each row.

In [None]:
#here we want the minimum within the row of the two columns below
Housing_df[["X Coordinate", "Y Coordinate"]].apply(np.min, axis=1) #axis=1 is across columns (default axis=0 index)

The above uses a built-in function ```np.min```, but we could create our own function here too.

In [None]:
def min_over_two_rows(row): #this function passes in the given rows from the dataframe
    '''Inputs: Two rows of interest
    Outputs the minimum over the rows'''
    
    return min(row["X Coordinate"], row["Y Coordinate"])


#both of the below do the same thing
option1 = Housing_df[["X Coordinate", "Y Coordinate"]].apply(min_over_two_rows, axis=1) #axis=1 is across columns 
option1

option2 = Housing_df.apply(min_over_two_rows, axis=1)
option2

### What about the ```.map()``` method?

This works with a function or dictionary

In [None]:
Housing_df['Over 50 units available via map'] = Housing_df.Units.map(over_50)
Housing_df

### General recap:

 - The .map() works for series or single columns of a DataFrame

 - The .apply() works for a single column or across multiple columns

```python
df['new column name'] = df['column of interest'].map(our_function)


df['new column name'] = df['column of interest'].apply(our_function)
```

### <mark style="background-color: Thistle">Working with DataFrame Functions: Activity!</mark>

Consider the Mental health data below. This Data set was collected by a survey conducted by Google forms from University students in order to examine their current academic situation and mental health.
    

In [None]:
Mental_health_df = pd.read_csv("https://raw.githubusercontent.com/SusannaLange/Data_118_images/main/DSSI_images/Student%20Mental%20health.csv")
Mental_health_df.head(5)  

1. Notice the column 'Your current year of Study has different input values: 'year 1' vs 'Year 1'. Print all unique entries for this column.

In [None]:
#code here

2. Create (or use a built-in) function that makes the case the same for all entries and apply it to column 'Your current year of Study'.

In [None]:
#code here

3. We want to determine how many students answered 'Yes' to at least one of 'Do you have Depression?', 'Do you have Anxiety?', or 'Do you have Panic attack?'. Create a new column called "Identified as having Depression, Anxiety, or Panic Attacks", then sum over this column to relay this information.

In [None]:
#code here