# Comprehensions

In [1]:
# import statements
import math
import csv
from statistics import mean

## Review of lambdas
- lambda functions are a way to abstract a function reference
- lambdas are simple functions with:
    - multiple possible parameters
    - single expression line as the function body
- lambdas are useful abstractions for:
    - mathematical functions
    - lookup operations
- lambdas are often associated with a collection of values within a list
- Syntax: *lambda* parameters: expression

### Let's sort the menu in different ways
- whenever you need to custom sort a dictionary, you must convert dict to list of tuples
- recall that you can use items method (applicable only to a dictionary)

In [2]:
menu = { 
        'broccoli': 4.99,
        'orange': 1.19,
        'pie': 3.95, 
        'donut': 1.25,    
        'muffin': 2.25,
        'cookie': 0.79,  
        'milk':1.65, 
        'bread': 5.99}  
menu

{'broccoli': 4.99,
 'orange': 1.19,
 'pie': 3.95,
 'donut': 1.25,
 'muffin': 2.25,
 'cookie': 0.79,
 'milk': 1.65,
 'bread': 5.99}

In [3]:
menu.items()

dict_items([('broccoli', 4.99), ('orange', 1.19), ('pie', 3.95), ('donut', 1.25), ('muffin', 2.25), ('cookie', 0.79), ('milk', 1.65), ('bread', 5.99)])

### Sort menu using item names (keys)
- let's first solve this using extract function
- recall that extract function deals with one of the inner items in the outer data structure
    - outer data structure is list
    - inner data structure is tuple

In [4]:
def extract(menu_tuple):
    return menu_tuple[0]

In [5]:
sorted(menu.items(), key = extract)

[('bread', 5.99),
 ('broccoli', 4.99),
 ('cookie', 0.79),
 ('donut', 1.25),
 ('milk', 1.65),
 ('muffin', 2.25),
 ('orange', 1.19),
 ('pie', 3.95)]

In [6]:
dict(sorted(menu.items(), key = extract))

{'bread': 5.99,
 'broccoli': 4.99,
 'cookie': 0.79,
 'donut': 1.25,
 'milk': 1.65,
 'muffin': 2.25,
 'orange': 1.19,
 'pie': 3.95}

### Now let's solve the same problem using lambdas
- if you are having trouble thinking through the lambda solution directly:
    - write an extract function
    - then abstract it to a lambda

In [7]:
dict(sorted(menu.items(), key = lambda menu_tuple: menu_tuple[0]))

{'bread': 5.99,
 'broccoli': 4.99,
 'cookie': 0.79,
 'donut': 1.25,
 'milk': 1.65,
 'muffin': 2.25,
 'orange': 1.19,
 'pie': 3.95}

### Sort menu using prices (values)

In [8]:
dict(sorted(menu.items(), key = lambda menu_tuple: menu_tuple[1]))

{'cookie': 0.79,
 'orange': 1.19,
 'donut': 1.25,
 'milk': 1.65,
 'muffin': 2.25,
 'pie': 3.95,
 'broccoli': 4.99,
 'bread': 5.99}

### Sort menu using length of item names (keys)

In [9]:
dict(sorted(menu.items(), key = lambda menu_tuple: len(menu_tuple[0])))

{'pie': 3.95,
 'milk': 1.65,
 'donut': 1.25,
 'bread': 5.99,
 'orange': 1.19,
 'muffin': 2.25,
 'cookie': 0.79,
 'broccoli': 4.99}

### Sort menu using decreasing order of prices - v1

In [10]:
dict(sorted(menu.items(), key = lambda menu_tuple: menu_tuple[1], reverse = True))

{'bread': 5.99,
 'broccoli': 4.99,
 'pie': 3.95,
 'muffin': 2.25,
 'milk': 1.65,
 'donut': 1.25,
 'orange': 1.19,
 'cookie': 0.79}

### Sort menu using decreasing order of prices - v2

In [11]:
dict(sorted(menu.items(), key = lambda menu_tuple: -menu_tuple[1]))

{'bread': 5.99,
 'broccoli': 4.99,
 'pie': 3.95,
 'muffin': 2.25,
 'milk': 1.65,
 'donut': 1.25,
 'orange': 1.19,
 'cookie': 0.79}

### Iterable

- What is an iterable? Anything that you can write a for loop to iterate over is called as an iterable.
- Examples of iteratables:
    - `list`, `str`, `tuple`, `range()` (any sequence)
    - `dict`

## List comprehensions

- concise way of generating a new list based on existing list item manipulation 
- short syntax - easier to read, very difficult to debug

<pre>
new_list = [expression for val in iterable if conditional_expression]
</pre>
- iteratble: reference to any iterable object instance
- conditional_expression: filters the values in the original list based on a specific requirement
- expression: can simply be val or some other transformation of val
- enclosing [ ] represents new list

Best approach:
- write for clause first
- if condition expression next
- expression in front of for clause last

### Which animals are in all caps?

In [12]:
# Recap: retain animals in all caps
animals = ["lion", "badger", "RHINO", "GIRAFFE"]
caps_animals = []
print("Original:", animals)

for val in animals:
    if val.upper() == val: # Do we want to keep the current animal?
        caps_animals.append(val)
        
print("New list:", caps_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['RHINO', 'GIRAFFE']


### Now let's solve the same problem using list comprehension
<pre>
new_list = [expression for val in iterable if conditional_expression]
</pre>
For the below example:
- iterable: animals variable (storing reference to a list object instance)
- conditional_expression: val.upper() == val
- expression: val itself

In [13]:
# List comprehension version
print("Original:", animals)

caps_animals = [val for val in animals if val.upper() == val]
print("New list:", caps_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['RHINO', 'GIRAFFE']


### Why is to tougher to debug?
- you cannot use a print function call in a comprehension
- you need to decompose each part and test it separately
- recommended to write the comprehension with a simpler example

### Other than a badger, what animals can you see at Henry Vilas Zoo?

In [14]:
print("Original:", animals)

non_badger_zoo_animals = [val for val in animals if val.upper() != "BADGER"]
print("New list:", non_badger_zoo_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['lion', 'RHINO', 'GIRAFFE']


### Can we convert all of the animals to all caps?
- if clause is optional

In [15]:
print("Original:", animals)

all_caps_animals = [val.upper() for val in animals]
print("New list:", all_caps_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['LION', 'BADGER', 'RHINO', 'GIRAFFE']


### Can we generate a list to store length of each animal name?

In [16]:
print("Original:", animals)

animals_name_length = [len(val) for val in animals]
print("New list:", animals_name_length)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: [4, 6, 5, 7]


### Using if ... else ... in a list comprehension
- syntax changes slightly for if ... else ...

<pre>
new_list = [expression if conditional_expression else alternate_expression for val in iterable ]
</pre>

- when an item satifies the if clause, you don't execute the else clause
    - expression is the item in new list when if condition is satified
- when an item does not satisfy the if clause, you execute the else clause
    - alternate_expression is the item in new list when if condition is not satisfied
    
- if ... else ... clauses need to come before for (not the same as just using if clause)

### What if we only care about the badger? Replace non-badger animals with "some animal".

In [17]:
animals = ["lion", "badger", "RHINO", "GIRAFFE"]
print("Original:", animals)

non_badger_zoo_animals = [val if val.upper() == "BADGER" else "some animal" \
                          for val in animals]
print("New list:", non_badger_zoo_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['some animal', 'badger', 'some animal', 'some animal']


## Dict comprehensions
- Version 1:
<pre>
{expression for val in iterable if condition}
</pre>
- expression has the form <pre>key: val</pre>
<br/>
- Version 2 --- the dict function call by passing list comprehension as argument:
<pre>dict([expression for val in iterable if condition])</pre>
- expression has the form <pre>(key, val)</pre>

### Create a dict to map number to its square (for numbers 1 to 5)

In [18]:
squares_dict = dict()
for val in range(1, 6):
    squares_dict[val] = val * val
print(squares_dict)

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


### Dict comprehension --- version 1

In [19]:
square_dict = {val: val * val for val in range(1, 6)}
print(square_dict)

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


### Dict comprehension --- version 2

In [20]:
square_dict = dict([(val, val * val) for val in range(1, 6)])
print(square_dict)

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


### Tuple unpacking
- you can directly specific variables to unpack the items inside a tuple

In [21]:
scores_dict = {"Bob": "32", "Cindy" : "45", "Alice": "39", "Unknown": "None"}

for tuple_item in scores_dict.items():
    print(tuple_item)
    
print("--------------------")

for key, val in scores_dict.items():
    print(key, val)

('Bob', '32')
('Cindy', '45')
('Alice', '39')
('Unknown', 'None')
--------------------
Bob 32
Cindy 45
Alice 39
Unknown None


### From square_dict, let's generate cube_dict

In [22]:
cube_dict = {key: int(math.sqrt(val)) ** 3 for key, val in square_dict.items()}
print(cube_dict)

{1: 1, 2: 8, 3: 27, 4: 64, 5: 125}


### Convert Madison *F temperature to *C
- <pre>C = 5 / 9 * (F - 32)</pre>

In [23]:
madison_fahrenheit = {'Nov': 28,'Dec': 20, 'Jan': 10,'Feb': 14}
print("Original:", madison_fahrenheit)

madison_celsius = {key: int(5 / 9 * (val - 32)) for key, val in madison_fahrenheit.items()}
print("New dict:", madison_celsius)

Original: {'Nov': 28, 'Dec': 20, 'Jan': 10, 'Feb': 14}
New dict: {'Nov': -2, 'Dec': -6, 'Jan': -12, 'Feb': -10}


### Convert type of values in a dictionary

In [24]:
scores_dict = {"Bob": "32", "Cindy" : "45", "Alice": "39", "Unknown": "None"}
print("Original:", scores_dict)

updated_scores_dict = {key: int(val) if val.isdigit() else None for key, val in scores_dict.items()}
print("New dict:", updated_scores_dict)

Original: {'Bob': '32', 'Cindy': '45', 'Alice': '39', 'Unknown': 'None'}
New dict: {'Bob': 32, 'Cindy': 45, 'Alice': 39, 'Unknown': None}


### Create a dictionary to map each player to their max score

In [25]:
scores_dict = {"Bob": [18, 72, 61, 5, 83], 
               "Cindy" : [27, 11, 55, 73, 87], 
               "Alice": [16, 33, 42, 89, 90], 
               "Meena": [39, 93, 9, 3, 55]}

{player: max(scores) for player, scores in scores_dict.items()}

{'Bob': 83, 'Cindy': 87, 'Alice': 90, 'Meena': 93}

## Practice problems - sorted + lambda

### Use sorted and lambda function to sort this list of dictionaries based on the score, from low to high

In [26]:
scores = [  {"name": "Bob", "score": 32} ,
            {"name": "Cindy", "score" : 45}, 
            {"name": "Alice", "score": 39}
     ]

sorted(scores, key = lambda d: d["score"])

[{'name': 'Bob', 'score': 32},
 {'name': 'Alice', 'score': 39},
 {'name': 'Cindy', 'score': 45}]

### Now, modify the lambda function part alone to sort the list of dictionaries based on the score, from high to low

In [27]:
sorted(scores, key = lambda d: -d["score"])

[{'name': 'Cindy', 'score': 45},
 {'name': 'Alice', 'score': 39},
 {'name': 'Bob', 'score': 32}]

### Now, go back to the previous lambda function definition and use sorted parameters to sort the list of dictionaries based on the score, from high to low

In [28]:
sorted(scores, key = lambda d: d["score"], reverse = True)

[{'name': 'Cindy', 'score': 45},
 {'name': 'Alice', 'score': 39},
 {'name': 'Bob', 'score': 32}]

## Student Information Survey dataset analysis

In [29]:
def median(items):
    items.sort()
    n = len(items)
    if n % 2 != 0:
        middle = items[n // 2]
    else:
        first_middle = items[n // 2]
        second_middle = items[(n // 2) - 1]
        middle = (first_middle + second_middle) / 2
    return middle

In [30]:
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
    exampleFile = open(filename, encoding="utf-8")  
    exampleReader = csv.reader(exampleFile) 
    exampleData = list(exampleReader)        
    exampleFile.close()  
    return exampleData

survey_data = process_csv('cs220_survey_data.csv')
cs220_header = survey_data[0]
cs220_data = survey_data[1:]
cs220_header

['Lecture',
 'Age',
 'Primary major',
 'Other majors',
 'Zip Code',
 'Pizza topping',
 'Pet owner',
 'Runner',
 'Sleep habit',
 'Procrastinator']

In [31]:
def cell(row_idx, col_name):
    """
    Returns the data value (cell) corresponding to the row index and 
    the column name of a CSV file.
    """
    col_idx = cs220_header.index(col_name) 
    val = cs220_data[row_idx][col_idx]  
    
    # handle missing values, by returning None
    if val == '':
        return None
    
    # handle type conversions
    if col_name in ["Age",]:
        return int(val)
    
    return val

def bucketize(bucket_column):
    """
    generates and returns bucketized data based on bucket_column
    """
    # Key: unique bucketize column value; Value: list of lists (rows having that unique 
    # column value)
    buckets = dict()
    for row_idx in range(len(cs220_data)):
        col_value = cell(row_idx, bucket_column)
        if col_value not in buckets:
            buckets[col_value] = []
        buckets[col_value].append(cs220_data[row_idx])
        
    return buckets

### Compute median age per lecture in one step using `dict` and `list` comprehension.

In [32]:
age_by_lecture = {} # Key: lecture; Value: list of ages

lecture_buckets = bucketize("Lecture")

for lecture in lecture_buckets:
    lecture_students = lecture_buckets[lecture]
    ages = []
    for student in lecture_students:
        age = student[cs220_header.index("Age")]
        ages.append(age)
    age_by_lecture[lecture] = ages

median_age_by_lecture = {} # Key: lecture; Value: median age of that lecture
for lecture in age_by_lecture:
    median_age = median(age_by_lecture[lecture])
    median_age_by_lecture[lecture] = median_age
    
print(median_age_by_lecture)

{'LEC002': '19', 'LEC001': '19', 'LEC004': '19', 'LEC003': '19'}


In [33]:
median_age_by_lecture2 = {lecture: median([student[cs220_header.index("Age")] \
                                           for student in students]) \
                          for lecture, students in lecture_buckets.items()}
median_age_by_lecture2

{'LEC002': '19', 'LEC001': '19', 'LEC004': '19', 'LEC003': '19'}

### Compute max age per lecture in one step using `dict` and `list` comprehension.

In [34]:
max_age_by_lecture2 = {lecture: max([student[cs220_header.index("Age")] \
                                           for student in students]) \
                          for lecture, students in lecture_buckets.items()}
max_age_by_lecture2

{'LEC002': '28', 'LEC001': '36', 'LEC004': '23', 'LEC003': '27'}

## Practice problems - comprehensions

### Using range and raise10 functions, generate a list to store 10 to powers 1, 2, 3, 4, and 5

In [None]:
[raise10(num) for num in range(1, 6)]

### Generate a new list where each number is a square of the original nummber in numbers list

In [None]:
numbers = [44, 33, 56, 21, 19]

[num ** 2 for num in numbers]

### Generate a new list of floats from vac_rates, that is rounded to 3 decimal points

In [None]:
vac_rates = [23.329868, 51.28772, 76.12232, 17.2, 10.5]

[round(rate, 3) for rate in vac_rates]

### Generate a new list of ints from words, that contains length of each word

In [None]:
words = ['My', 'very', 'educated', 'mother', 'just', 'served', 'us', 'noodles']

[len(word) for word in words]

### Create 2 dictionaries to map each player to their min and avg score

In [None]:
scores_dict = {"Bob": [18, 72, 61, 5, 83], 
               "Cindy" : [27, 11, 55, 73, 87], 
               "Alice": [16, 33, 42, 89, 90], 
               "Meena": [39, 93, 9, 3, 55]}

{player: min(scores) for player, scores in scores_dict.items()}

In [None]:
{player: sum(scores) / len(scores) for player, scores in scores_dict.items()}