# Python Generators

Handle large datasets.
Hide a little bit of state without the overhead of a class.
Streamy pipelines.
And more...

## Functions that behave like iterators 

Functions that keep on giving. Use them in `for` loops.

In [1]:
def function():
    """A standard function."""
    return [42]

function()

[42]

In [2]:
def generator():
    """Including a `yield` statement makes a generator"""
    yield 42
    
generator()

<generator object generator at 0x7f81747b00b0>

In [3]:
for i in function():
    print('f', i)

f 42


In [4]:
for i in generator():
    print('g', i)

g 42


*No need to build and pass around a list!*

In [5]:
def generator():
    """A loop inside makes more sense."""
    for i in range(10):
        if i % 2:
            yield i

In [6]:
for i in generator():
    print('g', i)

g 1
g 3
g 5
g 7
g 9


## Generator expressions

Simple iteration without need to create intermediate lists.

In [7]:
# List comprehension
[x*x for x in range(10) if x % 2]

[1, 9, 25, 49, 81]

In [8]:
# Generator expression
(x for x in range(10) if x % 2)

<generator object <genexpr> at 0x7f81747b0890>

Use it in place as an argument:

In [9]:
import random
set(random.random() for _ in range(5))

{0.21838191941094898,
 0.29516564416569946,
 0.3696599439642291,
 0.4893992600144058,
 0.5610619269068557}

## Lazy evaluation

Functions that keep on living. Code only runs when it has to.

In [10]:
def generator():
    print("Hi!")
    yield 42
    print("Done!")

In [11]:
generator()  # Output?

<generator object generator at 0x7f81747b0190>

In [12]:
list(generator())  # Output?

Hi!
Done!


[42]

In [13]:
g = generator()

In [14]:
next(g)

Hi!


42

In [15]:
def fibo():
    a = 0
    yield a
    b = 1
    yield b
    while True:  # endless loop!
        c = a + b
        yield c
        a, b = b, c
        

In [16]:
from itertools import takewhile  # itertools has handy tools for dealing with generators
list(takewhile(lambda x: x<10, fibo()))

[0, 1, 1, 2, 3, 5, 8]

## What for?

- stream processing, consumer pulls
  - reading HTTP body that can arrive in chunks
  - database result sets
  
- don't need or want to keep all data in memory
  - process gigantic CSV file
  
- endless results / unknown how many needed
    - counter
    
- building block for context managers...

- coroutines for async processing...


## Gotchas

### Usable only once

In [17]:
g = (c for c in 'Hello World!' if c.isupper())
print(list(g))

['H', 'W']


In [18]:
print(list(g))

[]


### Cleanup non-deterministic

In [19]:
def read_lines(filename):
    try:
        with open(filename) as f:
            print('--- file opened')
            for line in f:
                yield line.rstrip()  # remove trailing whitespace
    finally:
        print('--- file closed')

In [20]:
reader = read_lines('Python Generators.ipynb')
for i, l in enumerate(reader):
    if i > 5:
        break
    print(i, l)

--- file opened
0 {
1  "cells": [
2   {
3    "cell_type": "markdown",
4    "metadata": {},
5    "source": [


In [21]:
del reader  # Python closes generators on garbage collection (CPython does that when last reference dropped)

--- file closed


In [22]:
from contextlib import closing
with closing(read_lines('Python Generators.ipynb')) as reader:
    print(next(reader))

--- file opened
{
--- file closed


In [23]:
# Beware exceptions that get raised on cleanup

Also beware: https://amir.rachum.com/blog/2017/03/03/generator-cleanup/