### Iterators  and Generators ###

Just observe the for loops in python

In [1]:
for n in [1,2,3,4,5]:
    print(n)

1
2
3
4
5


In [2]:
for c in "string":
    print(c)

s
t
r
i
n
g


In [3]:
for key in {'x':1,'y':2, 'z':3}:
    print(key)

x
y
z


dictionary is an unordered collection, so the keys can appear in any order.

In [4]:
%%file nums.txt
one
two
three
four

Overwriting nums.txt


In [5]:
for line in open("nums.txt"):
    print(repr(line))

'one\n'
'two\n'
'three\n'
'four'


### The iteration protocol ###

In [6]:
items = [1, 2, 3]

In [7]:
itr = iter(items)

In [8]:
itr

<list_iterator at 0x7f12d0301080>

In [9]:
next(itr)

1

In [10]:
next(itr)

2

In [11]:
next(itr)

3

In [12]:
next(itr)

StopIteration: 

### Generators ###

In [13]:
def squares(numbers):
    for n in numbers:
        yield n*n

In [14]:
sq5 = squares(range(1, 6))

In [15]:
sq5

<generator object squares at 0x7f12d02e1620>

In [16]:
for i in sq5:
    print (i)

1
4
9
16
25


Lets examine how generators work by puting some prints

In [17]:
def squares(numbers):
    print ("Begin squares")
    for n in numbers:
        print("Computing square of ", n)
        yield n*n
    print ("Finish squares")

In [18]:
sq4 = squares([1,2,3,4])

In [19]:
sq4

<generator object squares at 0x7f12d02e1888>

In [20]:
next(sq4)

Begin squares
Computing square of  1


1

In [21]:
next(sq4)

Computing square of  2


4

In [22]:
next(sq4)

Computing square of  3


9

In [23]:
next(sq4)

Computing square of  4


16

In [24]:
next(sq4)

Finish squares


StopIteration: 

In [25]:
for x in squares([1, 2, 3, 4]):
    print(x)

Begin squares
Computing square of  1
1
Computing square of  2
4
Computing square of  3
9
Computing square of  4
16
Finish squares


Q: Is it possible to have return inside a generator function?

In [26]:
def f():
    for i in range(10000):
        if i == 13:
            return
        yield i*i

In [27]:
for s in f():
    print(s)

0
1
4
9
16
25
36
49
64
81
100
121
144


** Problem:** Write a generator `countdown` that takes a number n as argument and generates all numbers down to 0.
```
>>> for i in countdown(3):
...     print(i)
3
2
1
0
```
Use while loop to implement this.

** Bonus Problem: ** Write a generator `triangular` that takes a number n as argument and generates sequence of first n triangular numbers. nth triangular number is sum of first n natural numbers.
```
>>> for t in triangular(5):
...     print(t)
1
3
6
10
15
```

** Bonus Problem: **  Write a generator `fib_generator` that takes a number n as argument and generates all fibonacci numbers less than n
```
>>> for f in fib_generator(40):
...     print(f)
1
2
3
5
8
13
21
34    
```


In [28]:
def countdown(n):
    while n >= 0:
        yield n
        n -= 1

In [29]:
for i in countdown(3):
    print(i)

3
2
1
0


In [30]:
def fib_generator(n):
    prev,current = 1, 1
    while current < n:
        yield current
        prev, current = current, prev + current

In [31]:
for f in fib_generator(40):
    print(f)

1
2
3
5
8
13
21
34


In [32]:
def triangular(n):
    for i in range(1, n+1):
        yield sum(range(1, i+1))

In [33]:
for t in triangular(5):
    print(t)

1
3
6
10
15


### Generator Expressions ###

In [34]:
[n*n for n in range(10)] #list comprehension

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [35]:
(n*n for n in range(10)) #generator expression

<generator object <genexpr> at 0x7f12d02e1a40>

In [36]:
sum((x*x for x in range(1000000)))

333332833333500000

In [37]:
sum(x*x for x in range(1000000)) 

333332833333500000

When the generator expression is the only argument to a function, the parenthesis can be omited.

### What is advantage of using generators/Iterators ###
- evaluation is lazy, that gives advantage to handle huge data
- you can build lazy pipelines of processing data
- finally all the complications are encapsulated in generator function, while user sees clean iterator protocols, thats it!

### Example: Building data pipelines ###

In [38]:
import os
def find(root):
    for path,dirnames, filenames in os.walk(root):
        for f in filenames:
            yield os.path.join(path, f)

In [39]:
def take(n, seq):
    it = iter(seq)
    return list(next(it) for i in range(n))

In [40]:
def integers():
    """
    Lets generate infinite sequence if nutural numbers
    """
    i = 1
    while True:
        yield i
        i += 1

def squares(numbers):
    return (n*n for n in numbers)

In [41]:
take(10, squares(integers()))

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [42]:
def grep(pattern, seq):
    return (x for x in seq if pattern in x)

In [43]:
files = find(".")
pyfiles = grep(".py", files)
print(take(10, pyfiles))

['./scopetest.py', './anothermodule.py', './module.py', './trace.py', './fib.py', './cmdline.py', './fib1.py', './sq.py', './memoize.py', './sum.py']


In [44]:
def count(seq):
    i = 0
    for x in seq:
        i = i+1
    return i

In [45]:
count(range(10))

10

In [46]:
def count(seq):
    return sum(1 for x in seq) 

In [47]:
count(range(100))

100

In [48]:
files = find(".")
pyfiles = grep(".py", files)
print(count(pyfiles)) # count py files in current dorectory

10


In [49]:
def readlines(filenames):
    """Returns an iterators over lines in all the files specified.
    """
    for f in filenames:
        for line in open(f):
            yield line

How many lines of Python code have we written in this course?

In [50]:
files = find(".")
pyfiles = grep(".py", files)
lines = readlines(pyfiles)
print(count(lines))


154


How many python functions have we written in this course?

In [51]:
files = find(".")
pyfiles = grep(".py", files)
lines = readlines(pyfiles)
functions = grep("def ", lines)
print(count(functions))

19


** Problem: ** Write a function get_paragraphs to split given text into paragraphs.
The function should take a sequence of lines as argument and returns a sequence of paragraphs.

For sample input, see http://anandology.com/tmp/pg1342.txt

Once the function is there, we should be able to find:
- The number of paragraphs
- The longest paragraph

In [52]:
def get_paragraphs(lines):
    paragraph = []
    for line in lines:
        if line.strip() != "":
            paragraph.append(line)
        elif paragraph:
            yield "".join(paragraph)
            paragraph = []
    if paragraph:
        yield "".join(paragraph)

In [53]:
lines = ["A1\n", "A2\n", "\n", "B1\n", "\n", "C1\n", "C2\n"]

In [54]:
get_paragraphs(lines)

<generator object get_paragraphs at 0x7f12d02912b0>

In [55]:
list(get_paragraphs(lines))

['A1\nA2\n', 'B1\n', 'C1\nC2\n']