<h1><center> PPOLS564: Foundations of Data Science </center><h1>
<h3><center> Lecture 7 <br><br><font color='grey'> Functions and Comprehensions </font></center></h3>

# Function Basics

- **`def`**: keyword for generating a function
    + `def` + some_name + `()` + `:` to set up a function header
    
    
- **Arguments**: things we want to feed into the function and do something to. 
    
    
- **`return`**: keyword for returning a specific value from a function

In [None]:
def square(x):
    y = x*x
    return y

In [None]:
square(10)

### Docstrings 

Docstrings are strings that occur as the first statement within a named function block.

```python
def function_name(input):
    '''
    Your docstring goes here.
    '''
    |
    |
    | Function block
    |
    |
    return something 

```

**The goal of the docstring is to tell us what the function _does_.** We can request a functions docstring using the `help()` function.

In [None]:
def paste(string_one,string_two):
    '''
    This is a useless function that pastes two strings together 
    '''
    return string_one + " "  + string_two

paste("public","policy")

In [None]:
help(paste)

### Conventions of writing docstrings

[PEP-257](https://www.python.org/dev/peps/pep-0257/) says that "The docstring for a function or method should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions." Google offers a more useful [style guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md#38-comments-and-docstrings)
on how to set up a docstring.

Generally-speaking, it should look something like this:

```python
def function_name(x,y,z):
    '''Quick description of what the function does.
    
    A more detailed description, if need be. 
    
    Arguments:
        list of all the arguments and what they need to be 
        or what their default values are.
        x: 
        y: 
        x: 
        
    Returns:
        Short description regarding what the function returns
        
    Raises:
        All the types of errors that the function raises
        
        TypeError: 
        ValueError:
    '''
    |
    |
    | Function block
    |
    |
    return something

```

## Arguments

Arguments are all the input values that lie inside the parentheses. 

```python 
def fun(argument_1,argument_2):
```

---
We can supply **default values** to one or all arguments; in doing so, we've specified a default argument.

```python 
def fun(argument_1 = "default 1",argument_2 = "default 2"):
```

---

```python
def fun(a,b=""):
```

- argument `a` is called a **positional argument** (`*arg`). We provide value to it by matching the position in the sequence.
- argument `b` is called a **keyword argument** (`**kwargs`). Because we give it a default value.

---
Keyword arguments must **_come after_** positional arguments, or python will throw a SyntaxError.

In [None]:
def my_func(a,b=''):
    return a + b
my_func("cat","dog")

In [None]:
def my_func(a='',b):
    return a + b
my_func("cat","dog")

## Returning Multiple Arguments

In [None]:
def added_list(a,b,c,d):
    return [a,a+b,a+b+c,a+b+c+d]

added_list(1,2,3,4)

In [None]:
def added_tuple(a,b,c,d):
    return (a,a+b,a+b+c,a+b+c+d)

added_tuple(1,2,3,4)

In [None]:
def added_dict(a,b,c,d):
    return {"position 1": a,"position 2": a+b,
            "position 3": a+b+c, "position 4":a+b+c+d}

added_dict(1,2,3,4)

### Never use mutable values as defaults

Let's [visualize](https://goo.gl/u38EPx) why this is the case.

In [None]:
def my_func(a = []):
    a.append('x')
    return a

my_func()
my_func()
my_func()

To get around this, we only use immutable value as placeholders.

In [None]:
def my_func(a = None):
    if a is None:
        a = []
    a.append('x')
    return a

my_func()
my_func()
my_func()

## When to use a function?

Whenever you repeat a chunk of code or some process more than once, you should wrap it in a function. When writing functions we should think about two things:

1. Can the function generalize to other types of data or problems? If not, why? Is there a way that one might be able to write the function so that it is more general?
2. Have I documented what inputs and outputs the functions need in a docstring
    + You might be the only one who ever sees this function, but remember documentation is just as important for some future you as it is for some other person. 

# Applied Example 
Let's take the example from last time and build some functions that eases the cleaning process.

In [None]:
import csv 
import copy

def get_data(file_path):
    '''
    Reads in csv data and appends it to a list
    
    Arguments:
        - file_path: file path to file on a computer read in as a string.
    '''
    data = []
    with open(file_path) as f:
        for row in csv.DictReader(f):
            data.append(row) 
    return data

In [None]:
our_data = get_data("nes_2018_age-voted.csv")
our_data[20:40]

In [None]:
def clean_data(data=None):
    '''
    Take csv.DictReader data from the get_data() function and
    scans through each entry to see if its a digit. If the entry
    is a digit, then the function converts it to an integer. If a data entry 
    is missing, then the function converts it to a NoneType.
    '''
    function_data = copy.deepcopy(data) # Let's make a deep copy of the data
    for row in data:
        
        # Clean the vote column
        if type(row['voted']) == type('str'):
            if row['voted'].isdigit():
                row['voted'] = int(row['voted'])
            elif row['voted'] == 'NA':
                row['voted'] = None

        # Clean the age column       
        if type(row['age']) == type('str'):  
            if row['age'].isdigit():
                row['age'] = int(row['age'])            
            elif row['age'] == 'NA':
                row['age'] = None    
        
    return data
    

In [None]:
our_data_cleaned = clean_data(our_data)
our_data_cleaned[20:40]

Note how we repeated code in the code in the `clean_data` function? This is exactly the kind of thing that functions are useful to help us avoid. Let's wrap that code in its own function (within the existing function!)

In [None]:
def scrub(row,variable=''):
    '''
    for each specified variable in the given row, the function
    scans if it is a string. If so, it scans if it is a digit 
    and converts it to type int if true. If missing ('NA'), it 
    converts to type None.
    '''
    if type(row[variable]) == type('str'):
            if row[variable].isdigit():
                row[variable] = int(row[variable])
            elif row[variable] == 'NA':
                row[variable] = None

                
def clean_data(data=None):
    '''
    Take csv.DictReader data from the get_data() function and
    scans through each entry to see if its a digit. If the entry
    is a digit, then the function converts it to an integer. If a data entry 
    is missing, then the function converts it to a NoneType.
    '''
    
    function_data = copy.deepcopy(data) # Let's make a deep copy of the data
    
    for row in function_data:
        
        # Clean the vote column
        scrub(row,variable='voted')

        # Clean the age column
        scrub(row,variable='age')
        
    return data
    

All together now

In [None]:
our_data = get_data("nes_2018_age-voted.csv")
our_data_cleaned = clean_data(data=our_data)
our_data_cleaned[20:40]

----

# Comprehensions

Provide a readable and effective way of performing a particular expression on a iterable series of items.

The general form of the comprehension:

![](Figures/listComprehensions.jpg)

See [here](https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html) for more details.

## List Comprehensions

Using the list literals `[]` (brackets), we construct a `for` loop from within.

In [None]:
words = "This is a such a long course".split()
words

In [None]:
[len(word) for word in words]

## Set Comprehensions
_(New to Python 3)_

Using the set literals `{}`, we construct a `for` loop from within.

> Recall the difference between a set and dictionary is whether there is a key:value pair within the curly brackets. When there is a key:value pair within the brackets, it's a dictionary. When there are only values, it's a set. Use `type()` if you are unsure.

In [None]:
{len(word) for word in words}

## Dictionary Comprehensions
_(New to Python 3)_

Using the set literals `{}` and assigning a key value pair `{key : value}`, we construct a `for` loop from within.

In [None]:
# Create two lists: one full of values and another of equal length full of keys.
list_of_values = [1,2,3,4,5]
list_of_keys = ['a','b','c','d','e']
length_of_the_lists = len(list_of_values)

{list_of_keys[i]:list_of_values[i] for i in range(length_of_the_lists)}

## `if` statements in comprehensions

In [None]:
# Quickly produce a series of numbers
[i for i in range(10)]

In [None]:
[i for i in range(10) if i > 5 ]

`else` statements aren't valid in a comprehension, so the code statement needs to be kept simple.

In [None]:
[i  for i in range(10) if i > 5 else "hello"]

## Nested comprehensions

In [None]:
[i for i in range(5)]

In [None]:
[j for j in range(-5,0)]

In [None]:
[[i,j] for i in range(5) for j in range(-5,0)]

## Speed Boost

Comprehensions not only make our code more concise, they also increase the speed of our code

In [None]:
%%timeit
container = []
for i in range(1000):
    container.append(i)

In [None]:
%%timeit
container = [i for i in range(1000)]

The comprehension expression takes roughly half the time!

## Internal Scope

In [None]:
# Say we create a string object letter
letter = 'z'

# Then we use letter as a placeholder
letters = ['a','b','c','d']
for letter in letters:
    print(letter)

In [None]:
letter # Letter got over written!!!

Now let's do the same thing with a comprehension

In [None]:
letter = 'z'
[letter for letter in letters]

In [None]:
letter

What this means is the list comprehensions offer us more consistency and generate less issues when we arbitrarily assign named value for placeholders when using `for` loops

# Can comprehensions make manipulating data in python easier?

Recall the data structure from the assignment? Let's try and solve it using just comprehensions.

In [None]:
# Raw world bank data as a nested list.
country_data = [
    ['Country Name', '2006', '2007', '2008', '2009', '2010', '2011', '2012','2013', '2014', '2015', '2016'],
    ['Afghanistan', '', '73.4', '70.8', '68.2', '65.7', '63.3', '61', '58.8', '56.8', '54.9', '53.2'],
    ['Belize', '17.5', '17.2', '16.9', '16.5', '16.1', '15.6', '15.1', '', '14', '13.4', '12.8'],
    ['Germany', '3.8', '3.7', '3.6', '3.6', '3.5', '3.4', '3.3', '3.3', '3.3', '3.3', '3.2'],
    ['Greece', '3.6', '3.5', '3.3', '3.3', '3.2', '3.2', '3.2', '3.2', '3.2', '3.1', '3.1'],
    ['Iceland', '2.3', '2.2', '2.1', '2.1', '2', '', '1.8', '1.7', '1.7', '1.7', '1.6'],
    ['Nigeria', '93.2', '90', '87', '83.9', '81.1', '78.3', '75.7', '73.3', '71', '69', '66.9'],
    ['Thailand', '15', '', '13.8', '13.3', '12.8', '12.4', '12', '11.6', '11.2', '10.8', '10.5'],
    ['United States', '6.7', '6.6', '6.5', '6.4', '6.2', '6.1', '6', '5.9', '5.8', '5.7', '5.6'],
    ['Venezuela, RB', '15.4', '15', '14.8', '14.7', '14.7', '14.7', '14.7', '14.6', '', '14.3', '14'],
    ['Ethiopia','66.4', '62.9', '59.5', '56.5', '', '51.1', '48.6', '46.4', '44.4', '42.6', '41']
]

In [None]:
years = [int(year) for year in country_data[0] if year != 'Country Name']
years

In [None]:
countries = [country[0] for country in country_data[1:]]
countries

In [None]:
keys = [(country,year) for country in countries for year in years]
keys

In [None]:
def clean(x):
    '''
    Function converts a string from country_data to a None type or a float
    '''
    if x == '':
        y = None
    else:
        y = float(x)
    
    return y
        
print(type(clean('')))        
print(type(clean('4')))        

In [None]:
values = [clean(entry) for row in country_data[1:] for entry in row[1:]]
values

Put it all together 

In [None]:
len(keys)==len(values)

In [None]:
converted_data = {keys[i]:values[i] for i in range(len(keys))}
converted_data

----

**What does this look like all in one go?**

In [None]:
keys = [(country,int(year)) for country in countries for year in country_data[0] if year != 'Country Name']
values = [clean(entry) for row in country_data[1:] for entry in row[1:]]
converted_data = {keys[i]:values[i] for i in range(len(keys))}

# Call a key
converted_data[('Afghanistan', 2007)]