All new Python programmers stumble upon generators at some point. This is one of the more complex topics in Python. In this article, I will explain what generators are, where they are used, and, most importantly, how to use them.
A brief lecture about evaluating expressions in programming languages
Before we can dive into generators in Python, we first must go a bit theoretical, outside of the bounds of Python (or any specific programming language). Let's say we have the piece of code written in pseudo-code:
function square(x):
return x ^ 2
results = [square(2), square(3), square(4)]
How would our program compute the values inside of result
? When our program sees a function being called, it instantly calls the function, evaluating the result (the same applies to any expression, for that matter). This seems the most logical way to do an evaluation—calculate the result of an expression instantly at the moment the expression is made. This is what we call "eager" evaluation (which can also be seen as "strict evaluation").
Going back to Python for a moment, let's say we want to iterate over this list and print each of the items on the screen. If we rewrite the code above into Python, it would look something like this:
def square(x):
print(f'Called square with value {x}')
return x ** 2
results = [square(2), square(3), square(4)]
for result in results:
print(result)
The output of this snippet is:
Called square with value 2
Called square with value 3
Called square with value 4
4
9
16
Which is all good, everything works as expected. But what if we wanted to delay the evaluation of square
and have it evaluated by the time we need to print it's value, instead of evaluating it when we create the results
list ?
Let me expand on that - we still want to be able to have an iterable (be it a list or something else), which we can define at point A in the code. We just want to declare that we will want some values evaluated, but we don't want them evaluated until point B in the code.
This method of evaluating an expression - not instantly, but at the point we need it - is called lazy evaluation. This evaluation strategy is widely used in functional programming languages like Haskell and also in Python.
Iterator
Before going into yield
and generators, I want to give a quick refresher on what an iterator
is. An iterator is an object which contains a stream of data. It has two important methods - the __iter__
and __next__
. The __iter__
method returns the iterator itself, while __next__
returns the next time in the stream. Once __next__
is called, the current item is "consumed", so it cannot be returned by __next__
again. When all of the elements of the iterator have been consumed, a StopIteration
exception is raised.
Calling __iter__
on a list will return an iterator for that list.
numbers = [4, 24, 59]
iterator = numbers.__iter__()
print(next(iterator))
print(next(iterator))
print(next(iterator))
4
24
59
If we call next
again, the StopIteration
exception will be raised
print(next(iterator))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Custom generator
Now that we know what an iterator is, we can see that our lazy evaluation should behave in a similar way - we should get an object, which will return an item only once, raise StopIteration
when there are no more items in the stream, and do the evaluation when requested.
Forgetting about what Python already has - with this description, we can actually write our own lazy-evaluated iterator.
from typing import Callable
class LazyEvaluatedIterator:
def __init__(self, expression: Callable, args: list):
self.__expression = expression
self.__args = args
def __iter__(self):
return self
def __next__(self):
return self.__expression(*self.__args)
We can use it, with one minor modification - we must call the next
method on our new iterator, to call the actual evaluation of the iterator.
results = [LazyEvaluatedIterator(square, [2]),
LazyEvaluatedIterator(square, [3]),
LazyEvaluatedIterator(square, [4])]
for result in results:
print(next(result))
Called square with value 2
4
Called square with value 3
9
Called square with value 4
16
Comparing this output to the previous output, we can see that we delayed the calling of the square
function.
Yield and simple generators
But how can we implement this in Python? Can we modify our existing square
function to turn it into a generator?
Obviously, yes - we just have to replace the return
keyword with the yield
keyword. If we call a function that contains yield
, the function will return a generator. With that generator, which is also an iterator, we can call next
to get the current item in the generator (in our case, the squared number).
def square(x):
print(f'Called square with value {x}')
yield x ** 2
results = [square(2), square(3), square(4)]
for result in results:
print(next(result))
Called square with value 2
4
Called square with value 3
9
Called square with value 4
16
We have achieved the same effect as with our custom generator - delaying the evaluation of the expression until next
is called. Job accomplished.
But, this generator only evaluates a single item - can we have a generator that yields multiple items ? Let's try to put a few yield
statements into a function, and see what happens.
def multiple_yields():
counter = 1
print("First step")
yield counter
counter += 1
print("Second step")
yield counter
counter += 1
print("Third step")
yield counter
counter += 1
print("Fourth step")
generator = multiple_yields()
print("Calling the generator")
print(next(generator))
print(next(generator))
print(next(generator))
print(next(generator))
Calling the generator
First step
1
Second step
2
Third step
3
Fourth step
Traceback (most recent call last):
File "/tmp/python_playground/temp.py", line 22, in <module>
print(next(generator))
StopIteration
Many things are happening in this code, and I want to highlight a few things. From the output, we can deduct that the yield
statement "pauses" the function's execution until that point - essentially splitting the function into four pieces (I've formatted the code to make this visible). We can also see that the execution of the function starts once we do the first next
call (Notice how "Calling the generator" is before "First step"). On the last next
call, the code continues from the previous yield
onwards. All the code is executed (hence the "Fourth step"); however, as there is no yield
until the end of the function, a StopIteration
exception is raised.
Fun fact - for
loops in Python actually do two things - call iter
on the iterable, and call next
on the resulting object, storing the result in the variable that we put after the for
keyword. (The implications of this being, that we can write our own for
loop - but I will leave it for a separate article. If you don't want to miss it, subscribe to my newsletter, so you can get a notification once I release it). This also means that we can put our generator in a for
loop, and explore it that way.
generator = multiple_yields()
print("Calling the generator")
for item in generator:
print(item)
Calling the generator
First step
1
Second step
2
Third step
3
Fourth step
The only difference, is that we don't see the StopIteration
exception.
Now, we saw that we can have multiple yield
statements in a generator - but how about putting an yield
into a for
loop? Continuing on the squared theme, let's create a generator that yields the squares of the numbers between 1 and n
(where n
will be an argument passed to the generator function).
def squares(n: int):
for x in range(1, n+1):
yield x, x ** 2
five_squares = squares(5)
for number, square in five_squares:
print(f'The square of {number} is {square}')
The square of 1 is 1
The square of 2 is 4
The square of 3 is 9
The square of 4 is 16
The square of 5 is 25
One more example - let's create a generator that iterates over a 2D grid, left to right, top to bottom, starting from (0, 0). We will pass the bounds of the grid to the generator function.
def grid_2d(bounds: tuple[int, int]):
bound_x, bound_y = bounds
for x in range(bound_x):
for y in range(bound_y):
yield (x, y)
grid_3_by_3 = grid_2d((3, 3))
print(list(grid_3_by_3))
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
Infinite generators?
So far, we've had generators that contain either a fixed amount of yield
expressions, or a yield
in a for-loop - but can we have an infinite generator? One that can return items forever (or at least until we run out of memory or just stop the program)? Let's give it a try, by writing a generator that returns the next Fibonacci number:
def fibonacci():
previous = 0
current = 1
while True:
next_number = previous + current
previous, current = current, next_number
yield next_number
fib = fibonacci()
print(next(fib))
print(next(fib))
print(next(fib))
Executing the code above will give us the following result:
1
2
3
But okay, what will happen if we put this generator in a for-loop and print the result?
for num in fibonacci():
print(num)
If you execute this code, your screen will be flooded with numbers as the code generates bigger and bigger Fibonacci numbers. The code will continue to do it until you either interrupt it (Ctrl+C) or run out of resources (or turn off your PC). If you try to convert the generator to a list via list(fibonacci())
, the code will seemingly hang, while it tries to fill all available memory with Fibonacci numbers. Again, you can interrupt it or wait until Python runs out of memory.
Okay, considering that we can't use the entire contents of an infinite generator, why do they exist? Well, we don't have to use all the items (remember, due to lazy evaluation, the items won't be calculated until we need them). For example, we can write a function that returns the first Fibonacci number which is also a square of a number (technically speaking, the first Fibonacci, that is also a square is 1, but we will ignore it).
import math
def fibonacci():
previous = 0
current = 1
while True:
next_number = previous + current
previous, current = current, next_number
yield next_number
def __is_square(number):
return int(math.sqrt(number)) ** 2 == number
def get_first_fibonacci_square(fibonaccis):
next(fibonaccis)
for number in fibonaccis:
if __is_square(number):
return number
fibonaccis = fibonacci()
first_square = get_first_fibonacci_square(fibonaccis)
The result of this code will be 144 (as we skipped 1). Here, we stop going through the Fibonacci numbers once we see the first square number, so we are not affected by the fact that our generator is infinite. We can also create another infinite generator from this one, that returns only the even ones.
def even_fibonacci(fibonaccis):
for number in fibonaccis:
if number % 2 == 0:
yield number
fibonaccis = fibonacci()
even_fibonaccis = even_fibonacci(fibonaccis)
print(next(even_fibonaccis))
print(next(even_fibonaccis))
print(next(even_fibonaccis))
Here, we will get 2
, 8
,and 34
, as they are the first three even Fibonacci numbers.
The point of this is to show that we can work with an infinite stream of data - something widely used today. We can write our code so that we focus on the current item, without needing to evaluate all of the items. An excellent example of this is working with big files - we can have files that are multiple gigabytes in size and won't fit in memory, so we can't open them and get all of the content at once. In this situation, we take advantage of the lazy evaluation and focus only on the content we actually need.
Generator expressions
So far, we've looked at generators created using the yield
keyword. There is actually another way to write generators, using "generator expressions". As the name suggests, these are a single expression that returns a generator.
The syntax goes as follows: (f(item) for item in items if p(item))
. You may notice that it's similar to list comprehensions (if you want to learn about list comprehensions, you can take a look here). The main difference syntactically is the brackets - list comprehensions use []
, while generator expressions use ()
. The other big difference is the way list comprehensions and generators are evaluated - list comprehensions are eagerly evaluated, while generator expressions are lazily evaluated.
Let's create a generator using generator expressions. As this is a single expression, we can have one expression inside - it could be a function call or just a simple expression. We can take our squares example above and turn it into a generator expression:
def squares(n: int):
for x in range(1, n+1):
yield x, x ** 2
five_squares = squares(5)
for number, square in five_squares:
print(f'The square of {number} is {square}')
The first step would be to create a generator that returns the first five squares:
five_squares = ((x, x ** 2) for x in range(1, 6))
for number, square in five_squares:
print(f'The square of {number} is {square}')
The square of 1 is 1
The square of 2 is 4
The square of 3 is 9
The square of 4 is 16
The square of 5 is 25
Now, it's important to note that generator expressions don't accept arguments directly. We can bypass this by either defining the expression in a function or a lambda:
def squares(n: int):
return ((x, x ** 2) for x in range(1, n + 1))
five_squares = squares(5)
for number, square in five_squares:
print(f'The square of {number} is {square}')
squares = lambda n: ((x, x ** 2) for x in range(1, n + 1))
five_squares = squares(5)
for number, square in five_squares:
print(f'The square of {number} is {square}')
Both snippets return the same result:
The square of 1 is 1
The square of 2 is 4
The square of 3 is 9
The square of 4 is 16
The square of 5 is 25
Examples
Let's look at some examples of generators and their usage in the real world.
The most used generator in Python is range
. As the name suggests, range
generates the numbers in a given range. range
uses generators, so we don't load all the numbers in memory.
Another example is os.walk
. os.walk
goes through the directory, top-down (or bottom-up), and yields tuples of the current directory path, the names of directories in the current directory, and the file names of the files under the current directory.
Given the following directory structure:
.
├── file1.txt
├── first
│ ├── fifth
│ ├── file2.txt
│ └── second
│ ├── file3.txt
│ └── file4.txt
├── fourth
├── second
│ └── sixth
│ └── seventh
└── third
└── file5.txt
8 directories, 5 files
Looking at what os.walk
returns, we see it's a generator object:
import os
walker = os.walk('/tmp/example')
print(walker)
<generator object _walk at 0x7fc26b3fa960>
We can call next
on it to get the contents of the starting directory.
print(next(walker))
('/tmp/example', ['third', 'fourth', 'first', 'second'], ['file1.txt'])
We see the resulting tuple - the current directory, the directories in it, and the files in the directory. To view all files and directories, we can loop through the generator.
walker = os.walk('/tmp/example')
for dirname, dirnames, filenames in walker:
print(f'{dirname} contains the following directories: {dirnames}, and the following files: {filenames}')
/tmp/example contains the following directories: ['third', 'fourth', 'first', 'second'], and the following files: ['file1.txt']
/tmp/example/third contains the following directories: [], and the following files: ['file5.txt']
/tmp/example/fourth contains the following directories: [], and the following files: []
/tmp/example/first contains the following directories: ['fifth', 'second'], and the following files: ['file2.txt']
/tmp/example/first/fifth contains the following directories: [], and the following files: []
/tmp/example/first/second contains the following directories: [], and the following files: ['file3.txt', 'file4.txt']
/tmp/example/second contains the following directories: ['sixth'], and the following files: []
/tmp/example/second/sixth contains the following directories: ['seventh'], and the following files: []
/tmp/example/second/sixth/seventh contains the following directories: [], and the following files: []
Conclusion
Lazy evaluation is a powerful but often underutilized tool in programming languages. It gives programmers the ability to delay the evaluation of a given item until it's actually needed. With this, we can work with "infinite" collections without the need to evaluate every single item. Python allows lazy evaluation via generators, either using the yield
keyword or using generator expressions.
I hope you found this article helpful! If you did, make sure to leave a like. If you want to read more Python, click here. If you want to read something shorter, click here. As always, happy coding!