# 3.2 Functions

1. [General Info](#general)
2. [Namespaces, Scope, Local Functions](#namespace)
3. [Returning Multiple Values](#returnMulti)
4. [Functions Are Objects](#funcObj)
5. [Anonymous (lambda) Functions](#lambda)
6. [Generators](#generator)
7. [Errors and Exception Handling](#errors)

<a name="general"></a>
# General

Functions are declared with `def`  

The `return` keyword is optional, but its presence causes immediate return of the following expression.  

If `return` is not included, then `None` is returned.

So pretty similar to R here. If you want the function to return something, you have to have it return. If you just want it to print something, you don't need return.  

You can use positional and/or keyword arguments. If a function is defined with positional arguments, they're required. Keyword arguments are optional.  

Keyword arguments have to follow positional arguments.  

<a name="namespace"></a>
# Namespaces, Scope, and Local Functions

Follows essentially the same logic as R - a function has access to its own variables and global variables. Function within a function has its own, parent's and global.  

Any variable created within the function that isn't explicitly called to be `global` or `nonlocal` will get destroyed when function completes.

Generally don't want to use global though.

In [21]:
a = None
def bindAVariable():
    global a
    a = []

print(a)
bindAVariable()
print(a)

None
[]


<a name="returnMulti"></a>
# Returning Multiple Values

We're able to use the "unpacking tuples" (see 3.1 - Tuples) to return multiple variables out of a function.

We can also return things as a dictionary (This is essentially what I do in R when I return lists of outputs)

In [22]:
def returnTuple():
    a = 5
    b = 6
    c = 7
    return a, b, c

### Unpack the tuple
a, b, c = returnTuple()

print(a)
print(b)
print(c)

### Don't unpack
myTuple = returnTuple()
myTuple

5
6
7


(5, 6, 7)

In [23]:
def returnDict():
    a = 5
    b = 6
    c = 7
    return {"a" : a, "b" : b, "c" : c}

returnDict()

{'a': 5, 'b': 6, 'c': 7}

<a name="funcObj"></a>
# Functions Are Objects

I think this is the same as in R. The moral of the story here is to make your functions as granular as possible.  

Consider the example below. We have a list of strings that we need to clean up.  

The first method is just a function that iterates over the list and performs subsequent cleaning operations.  

The second method defines a unique function that cleans based on the regex and then combines that with some built-ins into a list of functions. This list of functions is provided as an argument to our main cleaning function.

In [24]:
import re

myStates_l = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda", "south   carolina##", "West virginia?"]

### Define function to clean these strings
def cleanStrings1(strings):
    result = []
    for value in strings:
        value = value.strip()                  # strip whitespace
        value = re.sub("[!#?]", "", value)     # remove punctuation
        value = value.title()                  # built in that: "Converts the first character of each word to upper case"
        result.append(value)
    return result

cleanStates1_l = cleanStrings1(myStates_l)
print(cleanStates1_l)

### Define punctuation removal as a function
def removePunctuation(value):
    return re.sub("[!#?]", "", value)

### List of cleaning functions
cleanOperations = [str.strip, removePunctuation, str.title]

### Define new function to clean strings
def cleanStrings2(strings, ops):
    result = []
    for value in strings:
        for func in ops:               # Loop over list of fxns and apply them instead of doing each separately
            value = func(value)
        result.append(value)
    return result

cleanStates2_l = cleanStrings2(myStates_l, cleanOperations)
print(cleanStates2_l)

['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South   Carolina', 'West Virginia']
['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South   Carolina', 'West Virginia']


<a name="lambda"></a>
# Anonymous (Lambda) Functions

These are functions that consist of a single statement and their result is the return value.  

They're declared using the `lambda` keyword.  

Below, `short_function` and `equiv_anon` do the same thing.

```python
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2
```

These are also in R and I didn't use them a ton there...I think I prefer having things as obvious as possible.

Main idea here is that you can write a function in-line (say as an argument to a different function) without having to fully declare it.

In [25]:
### Define a function that takes another function as an argument
### Here we just apply f() for every element in some_list
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]

### If we want to apply a function to the list `inits`, we can declare one using `def`, or we can just use a lambda fxn:
apply_to_list(some_list = ints, 
              f = lambda x: x * 2)

[8, 0, 2, 10, 12]

The below example is a little more complicated.  

We want to sort these strings by the number of unique (i.e. different) characters in each one.

The `.sort` method has an argument "key" where you can tell it how you want to sort by. 

In [26]:
strings = ["foo", "card", "bar", "aaaa", "abab"]

strings.sort(key = lambda x: len(set(x)))
print(strings)

['aaaa', 'foo', 'abab', 'bar', 'card']


<a name="generator"></a>
# Generators

Many objects in python support iteration by means of the *iterator protocol*.  

**An iterator is any object that will yield objects to the Python interpreter when used in a context like a for loop.** Most methods that accept list/list-like object will also accept an iterator.  

Generators are convenient ways to construct new iterable objects.  

**Generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used** huh?

When writing a function, use `yield` instead of `return` to make a generator

In [27]:
### Define a generator function
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2
        
### Execute the generator
myGen = squares()
print(myGen)

### Access (request) elements of the generator
for x in myGen:
    print(x, end=" ")

<generator object squares at 0x109e34970>
Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

Not exactly sure when to use this, but keep it in mind for now.  

## Generator Expressions

This is an analogue to list comprehension, but for generators - same format and everything, just uses () instead of []

In [28]:
### Verbose version
def _make_gen():
    for x in range(100):
        yield x ** 2

myVerboseGen = _make_gen()

### Generator expression version
myExprGen = (x ** 2 for x in range(100))

print(myVerboseGen)
print(myExprGen)

<generator object _make_gen at 0x109e34a50>
<generator object <genexpr> at 0x109e34ac0>


One use of generator expressions is as function arguments. Saves time/space because you don't have to define it separately and then put it in the function.  

The main benefit here relative to list comprehension is that generators can be faster due to their slightly different use of memory.

In [29]:
### List comprehension
print(sum([x ** 2 for x in range(100000000)]))

333333328333333350000000


In [30]:
### Generator expression
print(sum((x ** 2 for x in range(100000000))))

333333328333333350000000


Although the above example doesn't seem to show that...

## itertools module

`itertools` is a module in the standard library. It has a collection of generators for many common data algorithms.  

Their uses may not be immediately apparent, but probably good to keep these in mind as I'm sure they'll end up being useful.

[Official docs](https://docs.python.org/3/library/itertools.html)

<img src="./myImages/table3.2_itertools.png" width="500"/>

Below is an example of `groupby`, which "takes any sequence and a function and groups consecutive elements in the sequence by the return value of the function"

So for `names`, it groups Alan and Adam because they're consecutive and both start with A. Same with Wes and Will. Albert isn't in the A group because it's not consecutive.


In [31]:
import itertools

# Function to return first letter of a string:
def firstLetter(x):
    return x[0]

# Function to return last letter of a string:
def lastLetter(x):
    return x[len(x)-1]

# List of names
names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]
names2 = ["Alan", "Adam", "Wes", "Will", "Albert", "Matt", "Steven", "Sven"]

for letter, names in itertools.groupby(names, firstLetter):
    print(letter, list(names)) # names is a generator
print('\n')
for letter, names in itertools.groupby(names2, firstLetter):
    print(letter, list(names)) # names is a generator
print('\n')
for letter, names in itertools.groupby(names2, lastLetter):
    print(letter, list(names)) # names is a generator


A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
M ['Matt']
S ['Steven', 'Sven']


n ['Alan']
m ['Adam']
s ['Wes']
l ['Will']
t ['Albert', 'Matt']
n ['Steven', 'Sven']


<a name="errors"></a>
# Errors and Exception Handling

Use `try`/`except` for the most part.

Example - float produces a `ValueError` if provided value can't be converted to a float:

In [32]:
float("1.2345")
float("someString")

ValueError: could not convert string to float: 'someString'

Make a new version of the `float()` function that returns in the input if it can't be converted to a float:

In [None]:
import warnings

def tryFloat(x):
    try:
        return float(x)
    except:
        warnings.warn("Unable to convert to float. Returning x.\n")
        return x

tryFloat("1.2345")
tryFloat("someString")




'someString'

`ValueError` is not the only exception that can be returned by float. Providing a tuple will give a `TypeError`:

In [None]:
float((1,2))

TypeError: float() argument must be a string or a real number, not 'tuple'

Our original function will return the tuple:

In [None]:
tryFloat((1,2))




(1, 2)

We may want to only return if it's a scalar and still provide an error if it's the wrong type entirely. Add that to the except line:

In [None]:
def newTryFloat(x):
    try:
        return float(x)
    except ValueError:
        warnings.warn("Unable to convert to float. Returning x.\n")
        return(x)
    
newTryFloat((1,2))

TypeError: float() argument must be a string or a real number, not 'tuple'

In [None]:
newTryFloat("someString")




'someString'

Use `finally` to execute code at the end, regardless of whether or not the `try` code succeeds.