# 02_04: Comprehensions

cours by Michele Vallisneri

In Python, especially when you're dealing with data, there are many cases where you want to iterate over a list or a dict performing operation on every element and then collect all the results in a new list, or dict. You can certainly do that with a loop. For instance, picking up the example from the last video, let's compute the first 10 squares, starting with an empty list and adding elements in the body of the loop with append. 

In [16]:
import math
import collections

import numpy as np
import pandas as pd
import matplotlib.pyplot as pp

%matplotlib inline   

In [17]:
#starting with an empty list and
#adding elements in the body of the loop with append
squares = []
for i in range(1, 11):
    squares.append(i**2)

In [18]:
squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

This works, but we can do better. We can be more pythonic, that this, we can respect Python's specific style and spirit

Python offers a great feature, comprehensions,  that let us write shorter, more easily readable code, that achieves the same effect as the loop. In fact, the comprehension is a compressed version of the loop

Let's go through the steps to write one. 

It's a list we want, so we have brackets. Next, we have the loop. And then we backtrack to the beginning of the expression and we write code for the computation that we want to collect. In this case, taking the square. 

In [6]:
# list of the squares from 1^2 to 10^2; note power is ** in Python
squares = [i**2 for i in range(1, 11)]

In [7]:
squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

********************

Note:

Another example taken from https://docs.python.org/3/tutorial/datastructures.html
    

In [1]:
squares = []
for x in range (10):
    squares.append(x**2)
    
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


***************

The result is the same, but we managed to write it in a very readable and efficient way

We can also filter the list of elements that we are creating by adding an if clause.
For instance, we may want to collect only the squares that are divisible by four, which in fact, I need to do with the modulus operator (%).

In [19]:
# list of the squares from 1^2 to 10^2, including only those divisible by 4
squares_by_four = [i**2 for i in range(1, 11) if i**2 % 4 == 0]

In [20]:
squares_by_four

[4, 16, 36, 64, 100]

******************

Note:

In [3]:
squares =[x**2 for x in range(10)]
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


A list comprehension consist of brackets containing an expression followed by a 'for' clause, then zero or more 'for' or 'if' clauses. The result will be a new list resulting from evaluating the expression in the context of the 'for' and 'if' clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:

In [4]:
[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

and it's equivalent to: 

In [6]:
combs = []
for x in [1,2,3]:
    for y in [3,1,4]:
        if x != y:
             combs.append((x, y))
                
combs

[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

*****************

In Python three, comprehension largely replace the map and filter built-in functions, which are important and so called functional languages, but did not really belong in Python. The syntax for dictionary comprehensions is also rather intuitive.

For instance, let's create a dictionary that will get us the square of an integer from the integer itself.

It's a dictionary, so we need braces. The loop part is the same for variable and iterable. But now, instead of the list items, we need to write key colon value pairs. We can also add an if clause if we want. 

In [11]:
# dict of the squares from 1^2 to 10^2, where the keys are the unsquared numbers

squares_dict = {i: i**2 for i in range(1, 11)}

In [21]:
squares_dict

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}

Here is the result in dict. Dict comprehensions are sometimes used to transpose an existing dict.

Going back to our capitals, which we wrote as a dictionary index by country, we can get the countries index by capital. In the comprehension, we loop over the dict items, so we get tuples of country and capital, and we invert them by writing capital colon country. 

In [13]:
capitals_by_country = {'United States': 'Washington, DC', 'France': 'Paris', 'Italy': 'Rome'}

In [14]:
countries_by_capital = {capital: country for country, capital in capitals_by_country.items()}

In [15]:
countries_by_capital

{'Washington, DC': 'United States', 'Paris': 'France', 'Rome': 'Italy'}

Sometimes, you see what look like naked comprehensions without the brackets. Those are in fact generator expressions, which are useful when you want to generate a sequence and consume the elements one by one without ever storing them in a list or a dict. For instance, to take the sum of the first 10 squares, we may write the interior part of our comprehension without the brackets and feed it directly to sum. Doing this, saves memory and time which is important if you deal with large amounts of data. 

In [22]:
# sum the squares from 1^2 to 10^2
sum(i**2 for i in range(1, 11))

385

In [23]:
counting = []

for i in range(1, 11):
    for j in range(1, i+1):
        counting.append(j)

In [24]:
print(counting)

[1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In fact, the built-in range which we used earlier to demonstrate loops does something very similar. It never builds a list, but it keeps adding new values to the loop. If you don't currently use comprehension, I'm sure that if you try them you'll become addicted quickly. And you'll start doing all sorts of acrobatics, such as nested looping comprehensions. For instance, look at this nested loop, which creates a list of one, one two, one two three, one two three four, and so on. We can do the same with a nested comprehension just by writing the two loops in the same order in sequence. Comprehensions are incredibly useful to manipulate lists, dicts, and data. You should be familiar with both, understanding and writing them. 

In [25]:
# nested comprehension
counting = [j for i in range(1, 11) for j in range(1, i+1)]

In [7]:
print(counting)

NameError: name 'counting' is not defined

## Some note that taken by internet

Note from https://www.geeksforgeeks.org/nested-list-comprehensions-in-python/

    List Comprehensions are one of the most amazing features of Python. It is a smart and concise way of creating lists by iterating over an iterable object. Nested List Comprehensions are nothing but a list comprehension within another list comprehension which is quite similar to nested for loops.

Let’s take a look at some examples to understand what nested list comprehensions can do:
    

## Example 1

In [29]:
# Example 1:
#I wan to create a matrix which looks like below:

matrix = [[0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4]]
print(matrix)

[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]


The below code uses nested for loops for the given task:

In [28]:
matrix = []
for i in range (5):
    #Append an empty sublist inside
    matrix.append([])
    
    for j in range(5):
        matrix[i].append(j)
print(matrix)

[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]


The same output can be achieved using nested list comprehension in just one line:
    

In [30]:
#Nested list comprehension
matrix=[[j for j in range(5)] for i in range (5)]

print(matrix)

[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]


### Explanation

 The syntax of the above program is shown below:

[expression for i in range(5)] –> which means that execute this expression and append its output to the list until variable i iterates from 0 to 4.

For example:- [i for i in range(5)] –> In this case, the output of the expression
is simply the variable i itself and hence we append its output to the list while i
iterates from 0 to 4.

Thus the output would be –> [0, 1, 2, 3, 4]

But in our case, the expression itself is a list comprehension. Hence we need to first
solve the expression and then append its output to the list.

expression = [j for j in range(5)] –> The output of this expression is same as the
example discussed above.

Hence expression = [0, 1, 2, 3, 4].

Now we just simply append this output until variable i iterates from 0 to 4 which would
be total 5 iterations. Hence the final output would just be a list of the output of the
above expression repeated 5 times.

Output: [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

## Example 2

In [32]:
#Suppose I want to flatten a given 2-D list:

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

#Espected Output: flatten_matrix = [1, 2, 3, 4, 5, 6, 7, 8, 9]




This can be done using nested for loops as follows:

In [33]:
# 2-D List
matrix = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]

flatten_matrix = []

for sublist in matrix:
    for val in sublist:
        flatten_matrix.append(val)
        
print(flatten_matrix)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


Again this can be done using nested list comprehension which has been show below:
    

In [34]:
# 2-D List 

matrix = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]

#Nested List Comprehension to flatten a given 2-D

flatten_matrix = [val for sublist in matrix for val in sublist]

print(flatten_matrix)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


### Explanation

In this case, we need to loop over each element in the given 2-D list and append it
to another list. For better understanding, we can divide the list comprehension into
three parts:
    
    flatten_matrix = [val
                  for sublist in matrix
                  for val in sublist]
    
The first line suggests what we want to append to the list. The second line is the
outer loop and the third line is the inner loop.

‘for sublist in matrix’ returns the sublists inside the matrix one by one which would be:

[1, 2, 3], [4, 5], [6, 7, 8, 9]

‘for val in sublist’ returns all the values inside the sublist.

Hence if sublist = [1, 2, 3], ‘for val in sublist’ –> gives 1, 2, 3 as output one by one.

For every such val, we get the output as val and we append it to the list.
        

    

## Example 3

    Suppose I want to flatten a given 2-D list and only include those strings whose lengths are less than 6:

    planets = [[‘Mercury’, ‘Venus’, ‘Earth’], [‘Mars’, ‘Jupiter’, ‘Saturn’], [‘Uranus’, ‘Neptune’, ‘Pluto’]]


    Expected Output: flatten_planets = [‘Venus’, ‘Earth’, ‘Mars’, ‘Pluto’] 

This can be done using an if condition inside a nested for loop which is shown below:

In [36]:
#2-D List of planets

planets =[['Mercury', 'Venus', 'Earth'], ['Mars', 'Jupiter', 'Saturn'], ['Uranus', 'Neptune', 'Pluto']]

flatten_planets= []

for sublist in planets:
    for planet in sublist:
        
        if len(planet) <6:
            flatten_planets.append(planet)

print (flatten_planets)


['Venus', 'Earth', 'Mars', 'Pluto']


## Explanation

This example is quite similar to the previous example but in this example, we just
need an extra if condition to check if the length of a particular planet is less than
6 or not.

This can be divided into 4 parts as follows:

In [39]:
flatten_planets = [planet 
                   for sublist in planets 
                   for planet in sublist 
                   if len(planet) < 6] 
print(flatten_planets)

['Venus', 'Earth', 'Mars', 'Pluto']
