## Generators
______________________
https://www.python.org/dev/peps/pep-0255/  
https://realpython.com/introduction-to-python-generators/  
https://stackabuse.com/python-generators/  
https://wiki.python.org/moin/Generators  
https://www.youtube.com/watch?v=1t_NUJFh33Y  
https://www.youtube.com/watch?v=vKH4jIben70  
Intro to Python (Deitel and Deital, 2020)  
NICS data for example:
https://github.com/BuzzFeedNews/nics-firearm-background-checks/blob/master/data/nics-firearm-background-checks.csv

###### Generators are (in my inexperienced Googling/reading) essentially a way to minimize the amount of memory used and speed up processes when working with large data sets . A Generator Expression is like a list comprehension, but produces values on demand.  This is called lazy evaluation. Instead of processing the whole list or file and producing the entire result, it will evaluate one instance/item at a time, and return each when called.        
Deitel and Deitel (2020)

In [1]:
# using a list comprehension to print the values

numbers = [10, 3, 7, 1, 9, 4, 2, 8, 5, 6]

for value in [x ** 2 for x in numbers]: # note the brackets
    print(value, end = ' ')

100 9 49 1 81 16 4 64 25 36 

In [2]:
# using a generator expression to print the values

numbers = [10, 3, 7, 1, 9, 4, 2, 8, 5, 6]

for value in (x ** 2 for x in numbers): # generators use parantheses
    print(value, end = ' ')

100 9 49 1 81 16 4 64 25 36 

###### Doesn't look much different, I know.  When put into a loop it loops through and does the calls and returns all the values for you - like a normal list comprehension.  Let's look at it a different way.

In [3]:
sq_numbers = (x ** 2 for x in numbers) # we'll assign the generator expression to a varible

In [4]:
sq_numbers # now we see how it's a little different - calling sq_numbers shows us that it has created a generator object

<generator object <genexpr> at 0x000001D5B9DFDBA0>

In [5]:
next(sq_numbers) # to iterate through the generator we use the next() function

100

In [6]:
next(sq_numbers) # continuing to call the generator will iterate through subsequent values, one at a time

9

###### Generators can also be used in functions, shockingly known as Generator Functions.  This introduces us to the yield statement.  Whereas return just returns one value, the yield will "remember" where you left off and then continue down the list, as you call it.

In [7]:
# create an infinite sequence

def inf_seq():
    num = 0
    while True: # infinite because it's always True
        yield num # return (loses state) vs yield (maintains state)
        num += 1        

In [8]:
all_the_numbers = inf_seq()  # assign to a variable

In [9]:
print(all_the_numbers) # print to see the created generator object

<generator object inf_seq at 0x000001D5B9E8DAC0>


In [10]:
next(all_the_numbers) # use next to start using the generator - yields first value then waits until called again

0

In [11]:
next(all_the_numbers) # when called again yields the next value in the list

1

In [12]:
next(all_the_numbers) # and so on...

2

In [13]:
# and unlike return, you can have multiple yield statements in a function  

def inf_seq2():
    num = 0
    while True:
        yield num
        num += 1 # note that we can also do operations after the yield, since the function hasn't be exhausted as 
                 # happens with return - the operation will just occur with the next call of the function, unless...
        yield "taking a break" # you've added a second yield

In [14]:
more_numbers = inf_seq2() # assign to variable

In [15]:
next(more_numbers) # first call gives back the first yield

0

In [16]:
next(more_numbers) # second call gives back the second yield

'taking a break'

In [17]:
next(more_numbers) # the third call goes back to the first yield, but returns the next value in the list

1

In [18]:
next(more_numbers) # then we see the second yield again

'taking a break'

In [19]:
next(more_numbers) # and then back to the first yield, but with the next value, etc.

2

In [20]:
# and what if the loop isn't infinite?

def finite_num():
        nums = [1,2,3] # a very not infinite list
        for num in nums:
            yield num

In [21]:
some_nums = finite_num() # assign to variable

In [22]:
next(some_nums)

1

In [23]:
next(some_nums)

2

In [24]:
next(some_nums)

3

In [25]:
next(some_nums) # gives back notice that the end of the list has been reached and generator is done

StopIteration: 

In [26]:
list_of_numbers = list(finite_num()) # you an also assign the generator to a list

In [27]:
print(list_of_numbers) # and printing out the list yields into one list all at once

[1, 2, 3]


###### Another useful way to use generators is to slowly read in large files (note, the example I have below isn't terribly large - but gives you an idea of how it pulls things in).

In [28]:
import csv

In [29]:
# so, this is the normal file reading method...will just give you back the first line of the file

def normal_file_read(file):
    with open(file) as opened_file:
        for line in opened_file:
            return line 

In [30]:
print(normal_file_read('nics-firearm-background-checks.csv'))

month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,prepawn_long_gun,prepawn_other,redemption_handgun,redemption_long_gun,redemption_other,returned_handgun,returned_long_gun,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals



In [31]:
# using a generator (with the same code, except swap yield for return) and you can iterate and view each line as you go
# to get more than the first row using return, you'd need to write the whole file into a list, using up a lot of memory

def read_large_file(file_object):
    with open(file_object) as open_file:
        for line in open_file:  
            yield line

In [32]:
file = read_large_file('nics-firearm-background-checks.csv') # assign to variable

In [33]:
next(file)

'month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,prepawn_long_gun,prepawn_other,redemption_handgun,redemption_long_gun,redemption_other,returned_handgun,returned_long_gun,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals\n'

In [34]:
next(file)

'2021-02,Alabama,31803,512,20970,16026,1548,959,0,18,14,2,1966,791,7,19,0,0,0,0,35,27,6,2,5,0,74710\n'

In [35]:
next(file)

'2021-02,Alaska,222,1,3055,2564,375,189,0,3,0,0,122,79,1,31,15,0,0,0,7,3,0,0,0,0,6667\n'

In [36]:
next(file)

'2021-02,Arizona,9290,1159,20530,9991,1754,1114,0,10,3,3,1059,404,1,132,9,1,0,0,19,15,2,0,0,0,45496\n'

In [37]:
# we can also return more than one line at a time, when we iterate through
# https://www.kite.com/python/answers/how-to-append-elements-to-a-list-while-iterating-over-the-list-in-python
        
def read_large_file(filename, bin_len): # bin_len = how many lines you want to see
    with open(filename, 'r') as open_file:
        while True: # so it will keep looping through the file
            group = [] # create an empty list
            for lines in range(bin_len): # find the group of lines 
                group.append(next(open_file)) # append them to the list with next() - check out the link above
            yield group

In [38]:
file = read_large_file('nics-firearm-background-checks.csv', 5) # assign to variable

In [39]:
next(file) # call the generator

['month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,prepawn_long_gun,prepawn_other,redemption_handgun,redemption_long_gun,redemption_other,returned_handgun,returned_long_gun,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals\n',
 '2021-02,Alabama,31803,512,20970,16026,1548,959,0,18,14,2,1966,791,7,19,0,0,0,0,35,27,6,2,5,0,74710\n',
 '2021-02,Alaska,222,1,3055,2564,375,189,0,3,0,0,122,79,1,31,15,0,0,0,7,3,0,0,0,0,6667\n',
 '2021-02,Arizona,9290,1159,20530,9991,1754,1114,0,10,3,3,1059,404,1,132,9,1,0,0,19,15,2,0,0,0,45496\n',
 '2021-02,Arkansas,3123,1152,7068,5289,448,390,8,7,13,2,771,623,4,0,0,0,0,0,12,6,0,0,0,0,18916\n']

In [40]:
next(file) # the next call has looped through and found the next five

['2021-02,California,16025,11953,40789,27915,6499,0,0,2,0,0,508,275,6,1669,743,71,0,0,9126,3416,709,60,18,0,119784\n',
 '2021-02,Colorado,10891,6,21729,13839,1960,1760,0,0,0,0,0,0,0,247,53,0,0,0,0,0,0,0,0,0,50485\n',
 '2021-02,Connecticut,9792,476,6046,1996,1438,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19748\n',
 '2021-02,Delaware,353,0,2775,1563,158,153,0,0,0,0,11,6,0,103,0,0,0,0,74,39,8,1,1,0,5245\n',
 '2021-02,District of Columbia,538,1,304,4,0,2,0,0,0,0,0,0,0,0,0,71,0,0,0,0,0,0,0,0,920\n']

In [41]:
next(file) # and so on

['2021-02,Florida,31875,0,66307,26720,5116,2894,1,10,5,0,3162,776,8,1079,100,10,0,0,411,279,65,22,28,2,138870\n',
 '2021-02,Georgia,37243,0,21523,10215,1122,727,0,18,8,3,1495,705,15,14,0,0,0,0,18,12,2,0,0,0,73120\n',
 '2021-02,Guam,0,0,190,70,34,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,301\n',
 '2021-02,Hawaii,1504,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,1507\n',
 '2021-02,Idaho,6846,0,6973,5631,762,385,0,2,3,1,261,242,1,63,16,1,0,0,18,16,2,0,0,0,21223\n']

###### Being able to iterate through a file little by little and process it will save on memory - creating a 'data pipeline'.

https://realpython.com/introduction-to-python-generators/#using-advanced-generator-methods

In [42]:
file_name = "nics-firearm-background-checks.csv" # assign file name
lines = (line for line in open(file_name)) # create generator expression to read through file
list_lines = (s.rstrip().split(",") for s in lines) # another expression to make each line into a list
header = next(list_lines) # assign the first next() call to a variable, which should be the column headers

In [43]:
print(header)

['month', 'state', 'permit', 'permit_recheck', 'handgun', 'long_gun', 'other', 'multiple', 'admin', 'prepawn_handgun', 'prepawn_long_gun', 'prepawn_other', 'redemption_handgun', 'redemption_long_gun', 'redemption_other', 'returned_handgun', 'returned_long_gun', 'returned_other', 'rentals_handgun', 'rentals_long_gun', 'private_sale_handgun', 'private_sale_long_gun', 'private_sale_other', 'return_to_seller_handgun', 'return_to_seller_long_gun', 'return_to_seller_other', 'totals']


In [44]:
# now you can start making dictionaries from the data

file_dicts = (dict(zip(header, data)) for data in list_lines)

In [45]:
print(file_dicts) # just checking the generator is good to go

<generator object <genexpr> at 0x000001D5B9F65040>


In [46]:
next(file_dicts) # iterate and take a look at the dictionaries

{'month': '2021-02',
 'state': 'Alabama',
 'permit': '31803',
 'permit_recheck': '512',
 'handgun': '20970',
 'long_gun': '16026',
 'other': '1548',
 'multiple': '959',
 'admin': '0',
 'prepawn_handgun': '18',
 'prepawn_long_gun': '14',
 'prepawn_other': '2',
 'redemption_handgun': '1966',
 'redemption_long_gun': '791',
 'redemption_other': '7',
 'returned_handgun': '19',
 'returned_long_gun': '0',
 'returned_other': '0',
 'rentals_handgun': '0',
 'rentals_long_gun': '0',
 'private_sale_handgun': '35',
 'private_sale_long_gun': '27',
 'private_sale_other': '6',
 'return_to_seller_handgun': '2',
 'return_to_seller_long_gun': '5',
 'return_to_seller_other': '0',
 'totals': '74710'}

In [47]:
# do some processing/evaluating - here we are getting the total number of permits filed for the state of Alabama
# this generator will loop through the dictionaries and find any permit values for Alabama

monthly_total = (
    int(file_dict['permit'])
    for file_dict in file_dicts
    if file_dict["state"] == 'Alabama'
)

In [50]:
next(monthly_total) # peruse the permit values for Alabama in the dictionaries

30694

In [51]:
state_total = sum(monthly_total) # this runs the generator and sums the permit values as it iterates 
                                 # as opposed to pulling all the values and saving into memory first

In [52]:
print('Alabama total permits:', state_total) # total permits for Alabama in the csv file

Alabama total permits: 2094920


###### You have to be careful...the subsequent generators will start on whatever values are "left" in the generators that are used as inputs into expressions (running the list_lines generator to view the first actual row in the file returned a record from Alabama, which then was not counted in the permit tally and "perusing" the permits for Alabama ran through the monthly_total generator, also decreasing the state_total)

###### Some of the Youtube tutorials I noted initially go through examples (that may or may not look familiar) and include processing times and memory evaluations for return vs yield and list comprehension vs generator expressions.  The time variance isn't as profound (that I've seen in my searching), but the memory savings is huge.