# Agenda

- Assignment feedback
- Handling data
- Generators
- Requests
- Multiprocessing

## Iterators in Python

https://wiki.python.org/moin/Iterator  

### Iterators implement 2 methods:
__iter__() and __next__()  

Iterators minimize memory use sinse each element is lazy loaded.   
Generators are Iterators that can be easily written with a single function (rather than creating a class implementing the Iterator interface)

In [1]:
import random

class RandomIterable:
    """implementation of the iterator protocol with __next__ and __iter__ methods.
    __iter__() returns an iterator (normally the object itself)"""

    def __iter__(self):
        return self
    def __next__(self):
        if random.choice(["go", "go", "go", "go", "stop"]) == "stop":
            raise StopIteration  # signals "the end"
        return 1

In [21]:
[x for x in RandomIterable()]

[1, 1, 1, 1, 1]

In [28]:
iterable = RandomIterable() ## create an instance of iterable
my_iterator = iter(iterable)
try:
    element1 = next(my_iterator)
    element2 = next(my_iterator)
    print(element1,element2) # might throw a StopIterationException before running twice
except StopIteration as e:
    print(type(e))


1 1


### iterators in python

In [4]:
# map
map?
num1_lst = [10,20,30,40]
num2_lst = [4,3,2,1]
add_func = lambda x,y:x+y

result = map(add_func, num1_lst, num2_lst)
print(type(result))
print(result)
print(list(result))

<class 'map'>
<map object at 0x7f14c26afcd0>
[14, 23, 32, 41]


#  An intro to generators

https://wiki.python.org/moin/Generators

In [33]:
def firstn(n):
    """Our first generator that lazy loads each requested element"""
    num = 0
    while num < n:
        yield num
        num += 1

#[x for x in firstn(10)]
fn = firstn(10)
print(next(fn))
print(next(fn))

0
1


In [59]:
lst = list(firstn(10)) # now all elements are loaded in memory, so defies the purpose a bit.
lst

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

## Generators in Pandas

In [32]:
import pandas as pd
df = pd.read_csv('data/befkbhalderstatkode.csv') 

In [33]:
# show content of the dataframe
df.head()

Unnamed: 0,AAR,BYDEL,ALDER,STATKODE,PERSONER
0,2015,1,0,5100,614
1,2015,1,0,5104,2
2,2015,1,0,5106,1
3,2015,1,0,5110,1
4,2015,1,0,5120,4


In [34]:
# get a generator from the dataframe to iterate over DataFrame rows as (index, Series) pairs.
g = df.iterrows()
print(type(g))
for idx,row in g:
    if idx < 4:
        print(row,'\n')

<class 'generator'>
AAR         2015
BYDEL          1
ALDER          0
STATKODE    5100
PERSONER     614
Name: 0, dtype: int64 

AAR         2015
BYDEL          1
ALDER          0
STATKODE    5104
PERSONER       2
Name: 1, dtype: int64 

AAR         2015
BYDEL          1
ALDER          0
STATKODE    5106
PERSONER       1
Name: 2, dtype: int64 

AAR         2015
BYDEL          1
ALDER          0
STATKODE    5110
PERSONER       1
Name: 3, dtype: int64 



In [35]:
# using iteritems() instead of iterrows(): Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series
for x in df.iteritems():
    # print(x)
    pass
print('Printing tuple with 2 values: first value=columnname of first column, second series object with 0-indexed values of the first column')
first_column = next(df.iteritems())
print('column name: {}'.format(first_column[0]),'\n', first_column[1]) 

Printing tuple with 2 values: first value=columnname of first column, second series object with 0-indexed values of the first column
column name: AAR 
 0         2015
1         2015
2         2015
3         2015
4         2015
          ... 
542512    1992
542513    1992
542514    1992
542515    1992
542516    1992
Name: AAR, Length: 542517, dtype: int64


## Using iterators/generators

In [70]:
def check_prime(number):
    for divisor in range(2, int(number ** 0.5) + 1):
        if number % divisor == 0:
            return False
        return True

In [71]:

# import getsizeof from sys module 
from sys import getsizeof 
  
list_comprehension = [i for i in range(100000) if check_prime(i)] 
generator_expression = (i for i in range(100000) if check_prime(i))  
  
#gives size for list comprehension 
x = getsizeof(list_comprehension)  
y = getsizeof(generator_expression)  
print('size of the list in memorey:\t\t{}'.format(x))  
print('size of the generator in memorey:\t{}'.format(y))  


size of the list in memorey:		406504
size of the generator in memorey:	128


#### Put the following into a module: read_print.py and run it

```python 
import os
from memory_profiler import profile

@profile
def read_linewise(path):
    with open(path) as fp:
        for line in fp:
            yield line

@profile
def read_complete(path):
    with open(path) as fp:
        return fp.readlines()

@profile
def print_file_contents_linewise():
    for line in read_linewise('moby_dick.txt'):
        print(line, end='')


if __name__ == '__main__':
    if not os.path.isfile('moby_dick.txt'):
        os.system('wget -O moby_dick.txt http://www.gutenberg.org/files/2701/2701-0.txt')
    print_file_contents_linewise()
    print('\n---------------------')
    read_complete('moby_dick.txt')
```

## profiling the 2 methods
we can see in the third column that line 13 increments the load on memory with 3.1 MB because all data is read at once compared to first profile where the profiled method uses a generator to lazy load each line.

~~~bash
Filename: read_print.py

Line #    Mem usage    Increment   Line Contents
================================================
    15     38.6 MiB     38.6 MiB   @profile
    16                             def print_file_contents():
    17     39.3 MiB      0.3 MiB       for line in read_linewise('moby_dick.txt'):
    18     39.3 MiB      0.0 MiB           print(line, end='')



---------------------
Filename: read_print.py

Line #    Mem usage    Increment   Line Contents
================================================
    10     39.3 MiB     39.3 MiB   @profile
    11                             def read_complete(path):
    12     39.3 MiB      0.0 MiB       with open(path) as fp:
    13     42.4 MiB      3.1 MiB           return fp.readlines()
    

#### The Increment column is where we can see the added load on memory from particular code line



## Exercise create generator
Create a generator function that can take a list of names as parameter and return each name. 
Get approved unisex names here: 

`wget -O unisex_navne.xls https://ast.dk/_namesdb/export/names?format=xls&gendermask=4`

In [19]:
!wget -O unisex_navne.xls https://ast.dk/_namesdb/export/names?format=xls&gendermask=4

## The `%timeit` magic
Calculate execution time of a Python statement or expression
Runs the code 10.000 times and collect statistics (mean + standard deviation)

In [5]:
import time
def waiting():
    time.sleep(1)
    
%timeit waiting()

1 s ± 353 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [39]:
# line magic function
%timeit sum(range(0, 1000)) 

10.9 µs ± 691 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [40]:
%timeit sum(list(range(0,1000)))

18.4 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [41]:
%%timeit #must be first line in cell
#cell magic function

sum(range(0, 100)) 
sum(range(0, 100)) 

1.88 µs ± 60.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## Exercise python modules:

1. make 2 files, a main file and a module file, 'called test_my_module.py' and 'get_names.py' respectively.
2. in the module file write a function with a generator, that can serve one name at a time (like you created in the last lesson)
3. execute the function in the module file and test run it from cli with: `python get_names.py`
4. in the main file implement a function that can take a number and return that many names (using the module you made).
5. make sure that test_my_module.py can be run directly and that when running test_my_module, no top level code from get_names will run.