# Generators

In Python there are often many different ways to accomplish a single task.  So far you are familiar with many of Pythons primitive data types - specifically for this discussion we will focus on data types that can be iterated over.  

A list is a perfect example of a type that can be iterated over.  When Python iterated through the list there are a few major things to note about how it does it:

 - The list is fully loaded into memory
 - Each individual item in the list is exposed to the scope as the variable name you provided.
 
List iteration for review:

In [1]:
for name in ["JP", "Adam", "Peter"]:
    print(name)

JP
Adam
Peter


A generator is different from an iterator in that it looks more like a function with a return value.  Do not be fooled though, this is not a typical function that will return a value and exit forever.  Instead the generator function will return a generator object (from the `yield` statement).

In [2]:
def generator_function(values):
    for val in values:
        yield val ** val

In [3]:
for square in generator_function([1,2,3,4,5,6,7]):
    print(square)

1
4
27
256
3125
46656
823543


How this works may still be unclear - that's because it's iterating through the sequence `lazily`.  This means instead of moving through a sequence all at once the generator will iterate and calculate the value only when it's asked to - waiting until the very last minute before it has to do any work.  Hence the `lazy` term.  This is more clearly demonstrated below.

In [4]:
def generator_func():
    yield "JP"
    yield "Peter"
    yield "Adam"

In [5]:
print(next(generator_func()))

JP


The above code snippet more clearly demonstrates what is going on inside of a generator object.  The first time you call the function nothing really happens.  It's only after you need to iterate over the generator that you actually see something happening. Your own personal useage of `next` may vary in projects but it serves a good purpose to demonstrate what's going on here.

A smart use for a generator is for file I/O. The file you read in can be any size and if all you needed was access to the file contents "down the line" you would have to read the whole file in and pass all of that memory around.  With a generator you can create a reference to the file that doesn't get read until it's absolutely neccesary.

In [7]:
for name in generator_func():
    print(name)

JP
Peter
Adam


In [9]:
foo = range(10)
print(foo)

range(0, 10)


In [26]:
def file_contents(file_path):
    for line in open(file_path):
        yield line

def do_something_two(file_generator):
    # now that we are here lets start reading the file
    # As this iterates it will read a line from the file into memory and then
    # dump it as it's no longer needed
    for line in file_generator:
        print(line)
        break

def do_something_one(file_generator):
    # other stuff happens here
    do_something_two(file_generator)  # Just hand the file reference to it's final destination
        
def main_function():    
    file_generator = file_contents("pg15325.txt")  # haven't read the file yet
    do_something_one(file_generator)  # Invoke the function above, pass it file reference


main_function()

﻿The Project Gutenberg EBook of The Great Round World and What Is Going On



A final note on generators.  Python makes an extra effort for us as programmers so that we can treat an iterator the same way we would treat a generator.  This can make for refactoring memory intensive code to use a more lighter weight generator pretty painless. Just make sure you've got adequate test coverage for such a refactor.

# Named Tuple

Named Tuples (from here on referred to by their syntactically correct name `namedtuple`) are possibly the quickest way to convert a primitive data structure into a Python object-like instance.  It generates a hefty amount of boilerplate code for you and all you really pass in is a Name, Keys, and Values. A `namedtuple` can be especially useful if you're working with something like a JSON Api that will pass you a flat string of data that you need to convert into something useful.  Lets see an example: 


In [10]:
import json
from collections import namedtuple

In [11]:
class Person:
    def __init__(self, name, email, zip_code):
        self.name = name
        self.email = email
        self.zip_code = zip_code

In [13]:
bob = Person("Bob", "bob@bob.com", 89104)
bob

<__main__.Person at 0x106616c50>

In [14]:
Person = namedtuple("Person", "name email zip")

In [15]:
sally = Person(name="Sally", email="sally@sally.com", zip=89104)

In [16]:
sally

Person(name='Sally', email='sally@sally.com', zip=89104)

### What is happening here?

"Did you just make a class?"

"That's not how we've defined classes before?!"

If you thought either of these or any variation on this theme then you are not lost - you're seeing what you need to see.  `namedtuple` is actually what is called a `Factory`.  A `Factory` is a function that can return a class on the fly without having to define the class structure explicitly.

"It can return a class?"

Exactly!  I used the term `boilerplate` in the paragraph section above and what I meant by that was just "tedious code that might be duplicated that I don't want to write".  Using a `Factory` to generate a class for you automatically can allow you to more easily interact with 3rd party APIs that may change their data structure.

A `namedtuple` can also allow the your program the inject instance methods onto your object `on the fly`. See below:

In [18]:
JSON_STRING = '{"name": "garfield", "grade": "sophomore"}'
json_dict = json.loads(JSON_STRING)

In [19]:
print(json_dict)

{'grade': 'sophomore', 'name': 'garfield'}


In [20]:
Person = namedtuple("Person", json_dict.keys())

In [22]:
Person

__main__.Person

In [23]:
person = Person(*json_dict.values())

In [24]:
print(*json_dict.values())

sophomore garfield


In [25]:
person

Person(grade='sophomore', name='garfield')