## List Comprehensions and Dictionary Comprehensions


### Programming for Data Science
### Last Updated: Jan 15, 2023
---  

### PREREQUISITES
- variables
- lists
- `for` loop
- `if` statement

### SOURCES 
- https://www.pythonforbeginners.com/basics/list-comprehensions-in-python




### OBJECTIVES
- Explain the benefit of list comprehensions
- Illustrate the use of list comprehensions
- Explain the benefit of dict comprehensions
- Illustrate the use of dict comprehensions
 


### CONCEPTS

- list comprehension
- dict comprehension
- iterators


---

### Introducing List Comprehensions

`list comprehensions` provide a concise method for iterating over a list to create a new list.

Consider this task: check if each integer in a list is odd.  
Without list comprehensions, you might do this:

**Check if Odd**

In [2]:
vals = [1,5,6,8,12,15]
is_odd = []

for val in vals:   
    if val % 2 == 1: # if remainder is one, val is odd
        is_odd.append(True)
    else:       # else it's not odd
        is_odd.append(False)

is_odd

[True, True, False, False, False, True]

The code loops over each value in the list, checks the condition, and appends to a new list.  
The code works, but it's lengthy compared to a list comprehension.  
The approach takes extra time to write and understand.  

Let's solve with a list comprehension:

In [3]:
is_odd = [val % 2 == 1 for val in vals]
is_odd

[True, True, False, False, False, True]

Much shorter, and if you understand the syntax, quicker to interpet.

Now let's discuss the syntax.

**Syntax summary for the list comprehension:**

[<span style="color:blue">(expression: what to do with each element) </span> (<span style="color:red">the **for** loop with one or more arbitrary variables)</span> (zero or more conditional statements)]

for the example above:

[<span style="color:blue">val % 2 == 1</span> <span style="color:red">for val in vals</span>]

Note: 
- there are no conditional statements
- `val` is a placeholder 

## More examples

**Stop Word Remover**

Create list of words, and list of stop words.  
Filter out the stop words (considered not important).

In [4]:
stop_words = ['a','am','an','i','the','of']
words      = ['i','am','not','a','fan','of','the','film']

clean_words = [wd for wd in words if wd not in stop_words]
clean_words

['not', 'fan', 'film']

placing the color-coding on the list comprehension:

[<span style="color:blue"> wd </span> <span style="color:red"> for wd in words </span> if wd not in stop_words]

- the expression is very simple: **wd**. keep the word if meets condition
- the condition does the work: if the word isn't in list of stop words, keep it

**Select Tokens Containing Units**

Given a list of measurements, retain elements containing mmHg (millimeters of mercury)

In [5]:
units = 'mmHg'
measures = ['20', '115mmHg', '5mg', '10 mg', '7.5dl', '120 mmHg']

meas_mmhg = [meas for meas in measures if units in meas]
meas_mmhg   

['115mmHg', '120 mmHg']

*Filtering on two conditions*

In [6]:
units1 = 'mmHg'
units2 = 'dl'

meas_mmhg_dl = [meas for meas in measures if units1 in meas or units2 in meas]
meas_mmhg_dl

['115mmHg', '7.5dl', '120 mmHg']

---

### TRY FOR YOURSELF (UNGRADED EXERCISES)

1) Write and test a list comprehension that takes a list of values and returns a list of their cubes.

In [None]:
vals = [1,2,3,4]
cubes = [val**3 for val in vals]
cubes

add solution and fold to hide

2) Write and test a list comprehension that takes a list of strings and returns each string containing only a number, as in '12'.  
hint: `isdigit()` might help.

In [None]:
strs = ['99', 'red balloons', 'floating', 'in the', 'summer sky.', '16', 'candles']

strs_w_num = [st for st in strs if st.isdigit()]
strs_w_num

### Introducing Dictionary Comprehensions

Analogously to list comprehensions, `dictionary comprehensions` provide a concise method for iterating over a dictionary to create a new dictionary.

This is common when data is structured as key-value pairs, and we'd like to filter the dict.

In [None]:
# various deep learning models and their depths

model_arch = {'cnn_1':'15 layers', 'cnn_2':'20 layers', 'rnn': '10 layers'}

In [None]:
# create a new dict containing only key-value pairs where the key contains 'cnn'

cnns = {key:model_arch[key] for key in model_arch.keys() if 'cnn' in key}
cnns

We build the key-value pairs using `key:model_arch[key]`, where the key indexes into the dict `model_arch`

### TRY FOR YOURSELF (UNGRADED EXERCISES)

3) Given the dict `grid`, use a dict comprehension to retain only key-value pairs where the key contains 'max':

In [None]:
grid = {'max_depth':[5,10], 'ntrees':[100,200,300],'regularization':['l1','l2'],'max_iter':[10,20]}

In [None]:
grid_max = {key:grid[key] for key in grid.keys() if 'max' in key}
grid_max

4) Given the dict `letter_to_idx` which maps some letters to index values,  
use a dict comprehension to create a reversed dict, `idx_to_letter` 
mapping the index values to the letters.  
(this is a common NLP task)

In [None]:
letter_to_idx = {'a':0, 'b':1, 'c':2}

In [None]:
idx_to_letter= {letter_to_idx[k]:k for k in letter_to_idx.keys()}
idx_to_letter

---