In the following, pay close attention to how these four definitions, which often get confused
- iterator
- iterable
- generator function
- generator object
For these examples, first create a file with numbers 100 to 1:
with open("nums.txt", "w") as f:
for i in range(100, 0, -1):
f.write(str(i) + "\n")
When you call open(...)
to read a file, you get back a file object.
File objects are iterators, meaning you can call next
on them:
with open("nums.txt") as f:
print(next(f))
print(next(f))
print(next(f))
A list
object IS NOT an iterator, as Python will quickly complain
if you try running the following (TypeError: 'list' object is not an iterator
):
nums = [100, 99, 98, 97]
print(next(nums))
print(next(nums))
print(next(nums))
However, a list
object IS an iterable, meaning we can call iter
to get a different object that is an iterator.
nums = [100, 99, 98, 97]
it = iter(nums)
print(next(it))
print(next(it))
print(next(it))
When we see a for
loop for x in SOME_OBJECT:
, it does two things automatically:
- call
iter(SOME_OBJECT)
to get an iterator object fromSOME_OBJECT
(this impliesSOME_OBJECT
must be an iterable) - repeatedly call
next
on the iterator object, placing each value inx
, until there are no more values
You'll sometimes read about "generators" -- we avoid that simple phrasing in this course because it is often used ambiguously to refer to either generator functions or generator objects.
You've probably written functions like this many times (replacing XXXX
, YYYY
, and ZZZZ
with something else):
def f():
some_list = []
for XXXX in YYYY:
...
some_list.append(ZZZZ)
return some_list
You might elsewhere call such a function to get values over which to
loop (for x in f():
).
Generator functions are a useful alternative in such use cases; g
is
very similar to f
above:
def g():
for XXXX in YYYY:
...
yield ZZZZ
g
is a generator function because it has a yield statement
(instead of the append we saw earlier). Even though there is no
return
statement anymore, g
will return a generator object when
called, which is a good alternative to a list in some scenarios:
for x in g():
still works because generator objects are both iterators and iterablesg()[3]
won't work anymore because you can't index into generator objects
As long as we don't need indexing, generators have several advantages:
- the code is often a little shorter (and maybe more intuitive, once you get used to the idea)
- if
some_list
inf
is too big to fit in memory (RAM!), the generator alternative will save us because that approach never has all theZZZZ
values in memory at once - even more extreme, if you want to produce an infinite number of results, generators will still work
The cool thing about the following generator function is that it will
work even the entire nums.txt
(which is in storage) is too big
to fit in memory all at once. It's not unusual for storage space
to be hundreds of times bigger than memory space, so this is an
important technique when working with big datasets.
def rolling_sum():
total = 0
with open("nums.txt") as f:
for line in f:
total += int(line)
yield total
for x in rolling_sum():
print(x)
Here's an example of a generator that produces an infinite number of results. If you run this one, you'll need to click "Interrupt" under the "Kernel" menu in Jupyter if you don't want it to run forever.
def even_nums():
x = 0
while True:
yield x
x += 2
for x in even_nums():
print(x)