<a href="https://colab.research.google.com/github/yihaozhong/479_data_management/blob/main/generators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generators

A generator is a function that can be paused and resumed while still maintaining state between these stops and starts. You can think of them as "resumable functions". See the quick tutorial from the python docs here: https://docs.python.org/3/howto/functional.html#generators

Typically, when you call a function, you lose that function's local variables after it reaches the `return` statement. Generators allow you to return a value, *suspend* execution of the function, and then *resume* it later with all of the locals still intact!

To create a generator, make a function that has the keyword `yield` in it. `yield` is _like_ `return` in that it gives back the value immediately to the right of it. However, instead of stopping the function completely and discarding the locals, it temporarily suspends the execution of the function so that it can be continued later. Execution is controlled via the iterator protocol. From the docs:

> When you call a generator function, it doesn’t return a single value; instead it returns a generator object that supports the iterator protocol.

So, when you call a generator function, you immediately get a generator object back, but the function body itself is _not yet_ executed. The generator object returned behaves like an iterator; it has a `__next__` method ...so that means you can pass the generator object into the `next` function, similar to iterable objects returning iterators. Using `next` controls the function's execution; it starts or resumes the function until `yield` is encountered, at which point a value is returned and execution is temporarily suspended.




In [None]:
def f():
    print('print 1')
    yield 'return 1'
    print('print 2')
    yield 'return 2'
    print('print 3')
    yield 'return 3'

In [None]:
# note that f is not executed (nothing is printed out yet!)
gen_obj = f()

In [None]:
# calling next starts/resumes function execution until yield is encountered
# note that 
next(gen_obj)

print 1


'return 1'

In [None]:
next(gen_obj)

print 2


'return 2'

In [None]:
next(gen_obj)

print 3


'return 3'

In [None]:
try:
  next(gen_obj)
except StopIteration:
  print("No more iterations available.")


No more iterations available.


This means that generators can be looped over!


In [None]:
for val in f():
    print(val)

print 1
return 1
print 2
return 2
print 3
return 3


This seems similar to creating a class and implementing `__iter__` and `__next__`. And, that's true; generators are a simple way of getting an object back that supports the iterator protocol! No need to define a whole new class and define two methods on that class. Just write a function. Let's write some code that allow us to loop over the letters in the alphabet without creating a string of the entire alphabet beforehand.

In [None]:
class Alphabet:
    START, STOP = 65, 91
    def __init__(self):
        self.i = Alphabet.START
        
    def __iter__(self):
        return self
    
    def __next__(self):
        ch = chr(self.i)
        self.i += 1
        if self.i > Alphabet.STOP:
            raise StopIteration
        return ch

In [None]:
for letter in Alphabet():
    print(letter," ",end="")
print()    


A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  


In [None]:
# less code with a generator
def alphabet():
    aRange=(65,91)
    for i in range(*aRange):  
        yield chr(i)

In [None]:
for letter in alphabet():
    print(letter," ",end="")
print()    

A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  


In [None]:
for letter in alphabet():
  print(letter)
for letter in alphabet():
  print(letter)


In [None]:
def infinite_abc():
    START, STOP = 65,67
    i = START
    while True:
        if i > STOP:
            i = START
        yield chr(i)
        i += 1

In [None]:
myABC = infinite_abc();

In [None]:
i=0
for c in myABC:
  print(c," ",end="")
  i+=1
  if (i%20==0): print()
  if (i==200): break

A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  
C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  
B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  
A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  
C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  
B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  
A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  
C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  
B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  
A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  C  A  B  


Ed Exercise

Write a generator that gives the next fibonacci number the next time it is called. The first two Fibonacci numbers are 0 and 1 and subsequent numbers are the sum of the previous two. The first six calls should give 0, 1, 1, 2, 3, and 5.

In [None]:
def fib():
  x, y = 0, 1
  while True:
    yield x
    x, y = y, x+y
  

In [None]:
import itertools
print(list(itertools.islice(fib(),20)))


[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]


You can also create a generator object using a generator expression. How are lists and generator expressions different, though?

https://stackoverflow.com/questions/20535342/lazy-evaluation-in-python

> A list stores all elements when it is created. A generator generates the next element when it is needed.
> A list can be iterated over as much as you need, a generator can only be iterated over exactly once.
> A list can get elements by index, a generator cannot -- it only generates values once, from start to end.

https://stackoverflow.com/questions/2776829/difference-between-pythons-generators-and-iterators

> You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
> However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.

In [None]:
import sys
help(sys.getsizeof)

Help on built-in function getsizeof in module sys:

getsizeof(...)
    getsizeof(object, default) -> int
    
    Return the size of object in bytes.



In [None]:
lc = [i ** 2 for i in range(10000)]
ge = (i ** 2 for i in range(10000))

In [None]:
sys.getsizeof(lc)

87632

In [None]:
sys.getsizeof(ge)

128

In [None]:
ge = (i ** 2 for i in range(10000))
i=1
for n in ge:
    print(f"{n:5d} ",end="")
    if (i%10==0): print()
    i+=1
    if n > 5000:
        break

    0     1     4     9    16    25    36    49    64    81 
  100   121   144   169   196   225   256   289   324   361 
  400   441   484   529   576   625   676   729   784   841 
  900   961  1024  1089  1156  1225  1296  1369  1444  1521 
 1600  1681  1764  1849  1936  2025  2116  2209  2304  2401 
 2500  2601  2704  2809  2916  3025  3136  3249  3364  3481 
 3600  3721  3844  3969  4096  4225  4356  4489  4624  4761 
 4900  5041 

In [None]:
sys.getsizeof(ge)

128

Ed Exercise:

Write some code to find all of the squares between 1 and 10000 that end with a 1. Use a generator expression. Do not use a loop except for the loop that is in the generator expression.

In [None]:
ge=(i**2 for i in range(100))
print(list(filter(lambda x:x%10==1,ge)))

[1, 81, 121, 361, 441, 841, 961, 1521, 1681, 2401, 2601, 3481, 3721, 4761, 5041, 6241, 6561, 7921, 8281, 9801]


Ed Exercise:

Write a generator that advances the time by one second, keeping track of hours minutes, and seconds. Hours should range from zero to 23. It should return a tuple of hours, minutes, and seconds at each iteration (call to yield).

In [None]:
def militaryTimer():
  hours=0
  mins=0
  secs=0
  days=0
  while True:
    yield days,hours, mins, secs
    secs+=1
    if secs==60:
      mins+=1
      secs=0   
    if mins==60:
      hours+=1
      mins=0   
    if hours==24:
      hours=0      
      days+=1


In [None]:
myTimer=militaryTimer()
for i in range(1000000):
  days, hours, mins, secs = next(myTimer)
print(days,hours,mins,secs)  

11 13 46 39
