# Python Generators

Handle large datasets.
Hide a little bit of state without the overhead of a class.
Streamy pipelines.
And more...

## Functions that behave like iterators 

Functions that keep on giving. Use them in `for` loops.

In [1]:
def function():
    """A standard function."""
    return [42]

function()  # normal return value: list with a number inside

[42]

In [2]:
def generator():
    """Including a `yield` statement makes a generator"""
    yield 42
    
generator()  # returns generator object!

<generator object generator at 0x7f10a84a6270>

In [3]:
for i in function():  # iterate over returned list
    print('f', i)

f 42


In [4]:
for i in generator():  # iterate over returned generator object 
    print('g', i)

g 42


*No need to build and pass around a list!*

In [5]:
def generator():
    """A loop inside makes more sense."""
    for i in range(10):
        if i % 2:
            yield i

In [6]:
for i in generator():
    print('g', i)

g 1
g 3
g 5
g 7
g 9


## Generator expressions

Simple iteration without need to create intermediate lists.

In [7]:
# List comprehension
[x*x for x in range(10) if x % 2]

[1, 9, 25, 49, 81]

In [8]:
# Generator expression
(x for x in range(10) if x % 2)

<generator object <genexpr> at 0x7f10a84a6900>

Use it in place as an argument, wherever an iterable is expected:

In [9]:
import random
set(random.random() for _ in range(5))  # you can leave out parentheses if it's the only argument

{0.13664296261502762,
 0.5447603832229232,
 0.6394686999707228,
 0.7160268497050302,
 0.9750825215882083}

## Lazy evaluation

Functions that keep on living. Code only runs when it has to.

In [10]:
def generator():
    print("Hi!")
    yield 42
    print("Done!")

In [11]:
generator()  # Output?

<generator object generator at 0x7f10a84b5190>

No code ran inside the generator!

In [12]:
list(generator())  # Exhaust generator to create a list. Output?

Hi!
Done!


[42]

Now all the code ran.

In [13]:
g = generator()
next(g)  # Get one more value (the first in this case). 

Hi!


42

It only ran until the `yield` statement.

**Execution is synchronous.**

When you need the next value, execution jumps back into the function where it left off (the `yield` statement) and proceeds to the next `yield` or the end of the function. (Or an exception happens.)

In [14]:
def fibo():
    """Generate endless fibonacci sequence."""
    a = 0  # let's keep some state
    yield a   # can have more than one yield statement
    b = 1
    yield b
    while True:  # endless loop!
        c = a + b
        yield c
        a, b = b, c        

In [15]:
from itertools import takewhile  # itertools has handy tools for dealing with generators
list(takewhile(lambda x: x<10, fibo()))

[0, 1, 1, 2, 3, 5, 8]

**Execution is on-demand**

If you never need another value, execution does not resume. You can write endless loops and only run them as often as needed.

**Data is produced on-demand**

No need to collect things into data structures and keep them lying around. You can get and process one piece of data at a time.

## What for?

- stream processing, consumer pulls
  - reading HTTP body that can arrive in chunks
  - database result sets
  
- don't need or want to keep all data in memory
  - one-off iteration
  - process gigantic CSV file
  
- endless results / unknown how many needed
    - counter
    
- building block for context managers...

- coroutines for async processing...


## Gotchas

### Usable only once

In [16]:
g = (c for c in 'Hello World!' if c.isupper())
print(list(g))

['H', 'W']


In [17]:
print(list(g))

[]


If something else exhausts your generator before you get to it, this data will not be there for you anymore. There will be no error, there will just be no data. Not a common problem, until you start looking inside generators for during debugging. :) 

### Cleanup non-deterministic

In [18]:
def read_lines(filename):
    try:
        with open(filename) as f:
            print('--- file opened')
            for line in f:
                yield line.rstrip()  # remove trailing whitespace
    finally:
        # we `finally` to see print for prematurely closed generator:
        print('--- file closed')

In [19]:
reader = read_lines('Python Generators.ipynb')
for i, l in enumerate(reader):
    if i > 5:
        break
    print(i, l)

--- file opened
0 {
1  "cells": [
2   {
3    "cell_type": "markdown",
4    "metadata": {},
5    "source": [


**File is still open!** We did not run the generator to the end, so it did not close the file.

In [20]:
del reader  # Python closes generators on garbage collection (CPython does that when last reference dropped)

--- file closed


Whatever clean up you're relying on, it now depends on the code and flow of execution when this will happen. (Also on your Python implementation. Anybody using PyPy?)

In [21]:
from contextlib import closing
with closing(read_lines('Python Generators.ipynb')) as reader:
    # You can manually close generators, or use a special context manager that
    # makes sure a generator is closed on leaving context
    print(next(reader))

--- file opened
{
--- file closed


In [22]:
# Beware exceptions that get raised on cleanup

More about cleanup inside generators and their closing behavior: https://amir.rachum.com/blog/2017/03/03/generator-cleanup/