# Comprehensions

In [1]:
# import statements
import math
import csv

### Using `lambda`
- `lambda` functions are a way to abstract a function reference
- lambdas are simple functions with:
    - multiple possible parameters
    - single expression line as the function body
- lambdas are useful abstractions for:
    - mathematical functions
    - lookup operations
- lambdas are often associated with a collection of values within a list
- Syntax: 
```python 
lambda parameters: expression
```

### Let's sort the menu in different ways
- whenever you need to custom sort a dictionary, you must convert dict to list of tuples
- recall that you can use items method (applicable only to a dictionary)

In [2]:
menu = { 
        'broccoli': 4.99,
        'orange': 1.19,
        'pie': 3.95, 
        'donut': 1.25,    
        'muffin': 2.25,
        'cookie': 0.79,  
        'milk':1.65, 
        'bread': 5.99}  
menu

{'broccoli': 4.99,
 'orange': 1.19,
 'pie': 3.95,
 'donut': 1.25,
 'muffin': 2.25,
 'cookie': 0.79,
 'milk': 1.65,
 'bread': 5.99}

In [3]:
menu.items()

dict_items([('broccoli', 4.99), ('orange', 1.19), ('pie', 3.95), ('donut', 1.25), ('muffin', 2.25), ('cookie', 0.79), ('milk', 1.65), ('bread', 5.99)])

### Sort menu using item names (keys)
- let's first solve this using extract function
- recall that extract function deals with one of the inner items in the outer data structure
    - outer data structure is list
    - inner data structure is tuple

In [10]:
def extract(menu_tuple):
    return menu_tuple[0]

In [11]:
sorted(menu.items(), key = extract)

[('bread', 5.99),
 ('broccoli', 4.99),
 ('cookie', 0.79),
 ('donut', 1.25),
 ('milk', 1.65),
 ('muffin', 2.25),
 ('orange', 1.19),
 ('pie', 3.95)]

In [6]:
dict(sorted(menu.items(), key = extract))

{'bread': 5.99,
 'broccoli': 4.99,
 'cookie': 0.79,
 'donut': 1.25,
 'milk': 1.65,
 'muffin': 2.25,
 'orange': 1.19,
 'pie': 3.95}

### Now let's solve the same problem using lambdas
- if you are having trouble thinking through the lambda solution directly:
    - write an extract function
    - then abstract it to a lambda

In [7]:
dict(sorted(menu.items(), key = lambda menu_tuple: menu_tuple[0]))

{'bread': 5.99,
 'broccoli': 4.99,
 'cookie': 0.79,
 'donut': 1.25,
 'milk': 1.65,
 'muffin': 2.25,
 'orange': 1.19,
 'pie': 3.95}

### Sort menu using prices (values)

In [8]:
dict(sorted(menu.items(), key = lambda menu_tuple: menu_tuple[1]))

{'cookie': 0.79,
 'orange': 1.19,
 'donut': 1.25,
 'milk': 1.65,
 'muffin': 2.25,
 'pie': 3.95,
 'broccoli': 4.99,
 'bread': 5.99}

### Sort menu using length of item names (keys)

In [9]:
dict(sorted(menu.items(), key = lambda menu_tuple: len(menu_tuple[0])))

{'pie': 3.95,
 'milk': 1.65,
 'donut': 1.25,
 'bread': 5.99,
 'orange': 1.19,
 'muffin': 2.25,
 'cookie': 0.79,
 'broccoli': 4.99}

### Sort menu using decreasing order of prices - v1

In [10]:
dict(sorted(menu.items(), key = lambda menu_tuple: menu_tuple[1], reverse = True))

{'bread': 5.99,
 'broccoli': 4.99,
 'pie': 3.95,
 'muffin': 2.25,
 'milk': 1.65,
 'donut': 1.25,
 'orange': 1.19,
 'cookie': 0.79}

### Sort menu using decreasing order of prices - v2

In [11]:
dict(sorted(menu.items(), key = lambda menu_tuple: -menu_tuple[1]))

{'bread': 5.99,
 'broccoli': 4.99,
 'pie': 3.95,
 'muffin': 2.25,
 'milk': 1.65,
 'donut': 1.25,
 'orange': 1.19,
 'cookie': 0.79}

### Iterable

- What is an iterable? Anything that you can write a for loop to iterate over is called as an iterable.
- Examples of iteratables:
    - `list`, `str`, `tuple`, `range()` (any sequence)
    - `dict`

## List comprehensions

- concise way of generating a new list based on existing list item manipulation 
- short syntax - easier to read, very difficult to debug

<pre>
new_list = [expression for val in iterable if conditional_expression]
</pre>
- iteratble: reference to any iterable object instance
- conditional_expression: filters the values in the original list based on a specific requirement
- expression: can simply be val or some other transformation of val
- enclosing [ ] represents new list

Best approach:
- write for clause first
- if condition expression next
- expression in front of for clause last

### Which animals are in all caps?

In [12]:
# Recap: retain animals in all caps
animals = ["lion", "badger", "RHINO", "GIRAFFE"]
caps_animals = []
print("Original:", animals)

for val in animals:
    if val.upper() == val: # Do we want to keep the current animal?
        caps_animals.append(val)
        
print("New list:", caps_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['RHINO', 'GIRAFFE']


### Now let's solve the same problem using list comprehension
<pre>
new_list = [expression for val in iterable if conditional_expression]
</pre>
For the below example:
- iterable: animals variable (storing reference to a list object instance)
- conditional_expression: val.upper() == val
- expression: val itself

In [13]:
# List comprehension version
print("Original:", animals)

caps_animals = [val for val in animals if val.upper() == val]
print("New list:", caps_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['RHINO', 'GIRAFFE']


### Why is to tougher to debug?
- you cannot use a print function call in a comprehension
- you need to decompose each part and test it separately
- recommended to write the comprehension with a simpler example

### Other than a badger, what animals can you see at Henry Vilas Zoo?

In [14]:
print("Original:", animals)

non_badger_zoo_animals = [val for val in animals if val.upper() != "BADGER"]
print("New list:", non_badger_zoo_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['lion', 'RHINO', 'GIRAFFE']


### Can we convert all of the animals to all caps?
- if clause is optional

In [15]:
print("Original:", animals)

all_caps_animals = [val.upper() for val in animals]
print("New list:", all_caps_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['LION', 'BADGER', 'RHINO', 'GIRAFFE']


### Can we generate a list to store length of each animal name?

In [16]:
print("Original:", animals)

animals_name_length = [len(val) for val in animals]
print("New list:", animals_name_length)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: [4, 6, 5, 7]


### Using if ... else ... in a list comprehension
- syntax changes slightly for if ... else ...

<pre>
new_list = [expression if conditional_expression else alternate_expression for val in iterable ]
</pre>

- when an item satifies the if clause, you don't execute the else clause
    - expression is the item in new list when if condition is satified
- when an item does not satisfy the if clause, you execute the else clause
    - alternate_expression is the item in new list when if condition is not satisfied
    
- if ... else ... clauses need to come before for (not the same as just using if clause)

### What if we only care about the badger? Replace non-badger animals with "some animal".

In [17]:
animals = ["lion", "badger", "RHINO", "GIRAFFE"]
print("Original:", animals)

non_badger_zoo_animals = [val if val.upper() == "BADGER" else "some animal" \
                          for val in animals]
print("New list:", non_badger_zoo_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['some animal', 'badger', 'some animal', 'some animal']


In [12]:
animals = ["lion", "badger", "RHINO", "GIRAFFE"]
print("Original:", animals)

non_badger_zoo_animals = ["some animal" if val.upper() != "BADGER" else val \
                          for val in animals]
print("New list:", non_badger_zoo_animals)

Original: ['lion', 'badger', 'RHINO', 'GIRAFFE']
New list: ['some animal', 'badger', 'some animal', 'some animal']


## Dict comprehensions
- Version 1:
<pre>
{expression for val in iterable if condition}
</pre>
- expression has the form <pre>key: val</pre>
<br/>
- Version 2 --- the dict function call by passing list comprehension as argument:
<pre>dict([expression for val in iterable if condition])</pre>
- expression has the form <pre>(key, val)</pre>

### Create a dict to map number to its square (for numbers 1 to 5)

In [18]:
squares_dict = dict()
for val in range(1, 6):
    squares_dict[val] = val * val
print(squares_dict)

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


### Dict comprehension --- version 1

In [19]:
square_dict = {val: val * val for val in range(1, 6)}
print(square_dict)

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


### Dict comprehension --- version 2

In [20]:
square_dict = dict([(val, val * val) for val in range(1, 6)])
print(square_dict)

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


### Tuple unpacking
- you can directly specific variables to unpack the items inside a tuple

In [21]:
scores_dict = {"Bob": "32", "Cindy" : "45", "Alice": "39", "Unknown": "None"}

for tuple_item in scores_dict.items():
    print(tuple_item)
    
print("--------------------")

for key, val in scores_dict.items():
    print(key, val)

('Bob', '32')
('Cindy', '45')
('Alice', '39')
('Unknown', 'None')
--------------------
Bob 32
Cindy 45
Alice 39
Unknown None


### From square_dict, let's generate cube_dict

In [22]:
cube_dict = {key: int(math.sqrt(val)) ** 3 for key, val in square_dict.items()}
print(cube_dict)

{1: 1, 2: 8, 3: 27, 4: 64, 5: 125}


### Convert Madison *F temperature to *C
- <pre>C = 5 / 9 * (F - 32)</pre>

In [23]:
madison_fahrenheit = {'Nov': 28,'Dec': 20, 'Jan': 10,'Feb': 14}
print("Original:", madison_fahrenheit)

madison_celsius = {key: int(5 / 9 * (val - 32)) \
                   for key, val in madison_fahrenheit.items()}
print("New dict:", madison_celsius)

Original: {'Nov': 28, 'Dec': 20, 'Jan': 10, 'Feb': 14}
New dict: {'Nov': -2, 'Dec': -6, 'Jan': -12, 'Feb': -10}


### Convert type of values in a dictionary

In [24]:
scores_dict = {"Bob": "32", "Cindy" : "45", "Alice": "39", "Unknown": "None"}
print("Original:", scores_dict)

updated_scores_dict = {key: int(val) if val.isdigit() else None \
                       for key, val in scores_dict.items()}
print("New dict:", updated_scores_dict)

Original: {'Bob': '32', 'Cindy': '45', 'Alice': '39', 'Unknown': 'None'}
New dict: {'Bob': 32, 'Cindy': 45, 'Alice': 39, 'Unknown': None}


### Create a dictionary to map each player to their max score

In [25]:
scores_dict = {"Bob": [18, 72, 61, 5, 83], 
               "Cindy" : [27, 11, 55, 73, 87], 
               "Alice": [16, 33, 42, 89, 90], 
               "Meena": [39, 93, 9, 3, 55]}

{player: max(scores) for player, scores in scores_dict.items()}

{'Bob': 83, 'Cindy': 87, 'Alice': 90, 'Meena': 93}

## Practice problems - sorted + lambda

### Use sorted and lambda function to sort this list of dictionaries based on the score, from low to high

In [26]:
scores = [  {"name": "Bob", "score": 32} ,
            {"name": "Cindy", "score" : 45}, 
            {"name": "Alice", "score": 39}
     ]

sorted(scores, key = lambda d: d["score"])

[{'name': 'Bob', 'score': 32},
 {'name': 'Alice', 'score': 39},
 {'name': 'Cindy', 'score': 45}]

### Now, modify the lambda function part alone to sort the list of dictionaries based on the score, from high to low

In [27]:
sorted(scores, key = lambda d: -d["score"])

[{'name': 'Cindy', 'score': 45},
 {'name': 'Alice', 'score': 39},
 {'name': 'Bob', 'score': 32}]

### Now, go back to the previous lambda function definition and use sorted parameters to sort the list of dictionaries based on the score, from high to low

In [28]:
sorted(scores, key = lambda d: d["score"], reverse = True)

[{'name': 'Cindy', 'score': 45},
 {'name': 'Alice', 'score': 39},
 {'name': 'Bob', 'score': 32}]

## Student Information Survey dataset analysis

In [29]:
def median(items):
    items.sort()
    n = len(items)
    if n % 2 != 0:
        middle = items[n // 2]
    else:
        first_middle = items[n // 2]
        second_middle = items[(n // 2) - 1]
        middle = (first_middle + second_middle) / 2
    return middle

In [30]:
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
    exampleFile = open(filename, encoding="utf-8")  
    exampleReader = csv.reader(exampleFile) 
    exampleData = list(exampleReader)        
    exampleFile.close()  
    return exampleData

survey_data = process_csv('cs220_survey_data.csv')
cs220_header = survey_data[0]
cs220_data = survey_data[1:]
cs220_header

['Lecture',
 'Age',
 'Major',
 'Zip Code',
 'Latitude',
 'Longitude',
 'Pizza topping',
 'Pet preference',
 'Runner',
 'Sleep habit',
 'Procrastinator']

In [31]:
def cell(row_idx, col_name):
    """
    Returns the data value (cell) corresponding to the row index and 
    the column name of a CSV file.
    """
    col_idx = cs220_header.index(col_name) 
    val = cs220_data[row_idx][col_idx]  
    
    # handle missing values
    if val == '':
        return None
    
    # handle type conversions
    if col_name == "Age":
        val = int(val)
        if 0 < val <= 118:
            return val
        else:
            # Data cleaning
            return None
    elif col_name in ['Zip Code',]:
        return int(val)
    elif col_name in ['Latitude', 'Longitude']:
        return float(val)
    
    return val

In [32]:
def transform(header, data):
    """
    Transform data into a list of dictionaries, while taking care of type conversions
    """
    #should be defined outside the for loop, because it stores the entire data
    dict_list = []     
    for row_idx in range(len(data)):
        row = data[row_idx]
        #should be defined inside the for loop, because it represents one row as a 
        #dictionary
        new_row = {}         
        for i in range(len(header)):
            val = cell(row_idx, header[i])
            new_row[header[i]] = val
        dict_list.append(new_row)
    return dict_list
        
transformed_data = transform(cs220_header, cs220_data)
transformed_data[:2] # top 2 rows

[{'Lecture': 'LEC001',
  'Age': 22,
  'Major': 'Engineering: Biomedical',
  'Zip Code': 53703,
  'Latitude': 43.073051,
  'Longitude': -89.40123,
  'Pizza topping': 'none (just cheese)',
  'Pet preference': 'neither',
  'Runner': 'No',
  'Sleep habit': 'no preference',
  'Procrastinator': 'Maybe'},
 {'Lecture': 'LEC006',
  'Age': None,
  'Major': 'Undecided',
  'Zip Code': 53706,
  'Latitude': 43.073051,
  'Longitude': -89.40123,
  'Pizza topping': 'none (just cheese)',
  'Pet preference': 'neither',
  'Runner': 'No',
  'Sleep habit': 'no preference',
  'Procrastinator': 'Maybe'}]

In [33]:
def bucketize(data, bucket_column):
    """
    data: expects list of dictionaries
    bucket_column: column for bucketization
    generates and returns bucketized data based on bucket_column
    """
    # Key: unique bucketize column value; Value: list of dictionaries 
    # (rows having that unique column value)
    buckets = dict()
    for row_dict in data:
        col_value = row_dict[bucket_column]
        if col_value not in buckets:
            buckets[col_value] = []
        buckets[col_value].append(row_dict)
        
    return buckets

### What is the average age of "LEC001" students?

In [34]:
lecture_buckets = bucketize(transformed_data, "Lecture")
lec001_bucket = lecture_buckets["LEC001"]
lec001_ages = [student_dict["Age"] for student_dict in lec001_bucket \
               if student_dict["Age"] != None]
round(sum(lec001_ages) / len(lec001_ages), 2)

20.05

### What is the average age of "LEC001" students who like "pineapple" pizza topping?

In [35]:
lec001_pineapple_ages = [student_dict["Age"] for student_dict in lec001_bucket \
                         if student_dict["Age"] != None and \
                         student_dict["Pizza topping"] == "pineapple"]
round(sum(lec001_pineapple_ages) / len(lec001_pineapple_ages), 2)

20.86

### What are the sleep habits of the youngest students?

In [36]:
min_age = None

# pass 1: find minimum age
for student_dict in transformed_data:
    age = student_dict["Age"]
    if age == None:
        continue
    if min_age == None or age < min_age:
        min_age = age

# pass 2: find sleep habit of students with minimum age
sleep_habits = [student_dict["Sleep habit"] for student_dict in transformed_data\
               if student_dict["Age"] != None and student_dict["Age"] == min_age]
sleep_habits

['night owl',
 'early bird',
 'no preference',
 'night owl',
 'no preference',
 'night owl',
 'night owl',
 'early bird',
 'early bird',
 'night owl',
 'early bird',
 'night owl',
 'night owl',
 'early bird',
 'night owl',
 'night owl',
 'night owl',
 'night owl',
 'no preference',
 'night owl',
 'no preference',
 'night owl']

### How many students are there is each lecture?
- Create a `dict` mapping each lecture to the count of students.

In [37]:
# v1
{lecture:len(lecture_buckets[lecture]) for lecture in lecture_buckets}

{'LEC001': 195,
 'LEC006': 78,
 'LEC004': 196,
 'LEC005': 190,
 'LEC002': 197,
 'LEC003': 136}

In [38]:
# v2
{lecture:len(students) for lecture, students in lecture_buckets.items()}

{'LEC001': 195,
 'LEC006': 78,
 'LEC004': 196,
 'LEC005': 190,
 'LEC002': 197,
 'LEC003': 136}

### Find whether 15 oldest students in the class are runners?

In [39]:
students_with_age = [student_dict for student_dict in transformed_data \
                     if student_dict["Age"] != None]
[student_dict["Runner"] for student_dict in sorted(students_with_age, \
                            key = lambda s_dict: s_dict["Age"], reverse = True)[:15]]

['No',
 'No',
 'No',
 'Yes',
 'No',
 'Yes',
 'No',
 'No',
 'Yes',
 'No',
 'No',
 'No',
 'No',
 'No',
 'No']

In [40]:
students_with_age = [student_dict for student_dict in transformed_data \
                     if student_dict["Age"] != None and 0 < student_dict["Age"] <= 118]
#students_with_age

### Compute median age per lecture in one step using `dict` and `list` comprehension.

In [41]:
age_by_lecture = {} # Key: lecture; Value: list of ages

for lecture in lecture_buckets:
    lecture_students = lecture_buckets[lecture]
    ages = []
    for student in lecture_students:
        age = student["Age"]
        if age == None:
            continue
        ages.append(age)
    age_by_lecture[lecture] = ages

median_age_by_lecture = {} # Key: lecture; Value: median age of that lecture
for lecture in age_by_lecture:
    median_age = median(age_by_lecture[lecture])
    median_age_by_lecture[lecture] = median_age
    
print(median_age_by_lecture)

{'LEC001': 19, 'LEC006': 18.0, 'LEC004': 19, 'LEC005': 19.0, 'LEC002': 19, 'LEC003': 19.0}


In [42]:
median_age_by_lecture2 = {lecture: median([student["Age"] for student in students \
                                           if student["Age"] != None]) \
                          for lecture, students in lecture_buckets.items()}
median_age_by_lecture2

{'LEC001': 19,
 'LEC006': 18.0,
 'LEC004': 19,
 'LEC005': 19.0,
 'LEC002': 19,
 'LEC003': 19.0}

### Compute max age per lecture in one step using `dict` and `list` comprehension.

In [43]:
max_age_by_lecture2 = {lecture: max([student["Age"] for student in students \
                                           if student["Age"] != None]) \
                          for lecture, students in lecture_buckets.items()}
max_age_by_lecture2

{'LEC001': 37,
 'LEC006': 23,
 'LEC004': 53,
 'LEC005': 32,
 'LEC002': 31,
 'LEC003': 25}

## Practice problems - comprehensions

### Generate a new list where each number is a square of the original nummber in numbers list

In [44]:
numbers = [44, 33, 56, 21, 19]

[num ** 2 for num in numbers]

[1936, 1089, 3136, 441, 361]

### Generate a new list of floats from vac_rates, that is rounded to 3 decimal points

In [45]:
vac_rates = [23.329868, 51.28772, 76.12232, 17.2, 10.5]

[round(rate, 3) for rate in vac_rates]

[23.33, 51.288, 76.122, 17.2, 10.5]

### Generate a new list of ints from words, that contains length of each word

In [46]:
words = ['My', 'very', 'educated', 'mother', 'just', 'served', 'us', 'noodles']

[len(word) for word in words]

[2, 4, 8, 6, 4, 6, 2, 7]

### Create 2 dictionaries to map each player to their min and avg score

In [47]:
scores_dict = {"Bob": [18, 72, 61, 5, 83], 
               "Cindy" : [27, 11, 55, 73, 87], 
               "Alice": [16, 33, 42, 89, 90], 
               "Meena": [39, 93, 9, 3, 55]}

{player: min(scores) for player, scores in scores_dict.items()}

{'Bob': 5, 'Cindy': 11, 'Alice': 16, 'Meena': 3}

In [48]:
{player: sum(scores) / len(scores) for player, scores in scores_dict.items()}

{'Bob': 47.8, 'Cindy': 50.6, 'Alice': 54.0, 'Meena': 39.8}

### Student Information Survey dataset

### Create dict mapping unique age to count of students with that age.
- Order the dictionary based on increasing order of ages
- Make sure to drop student dictionaries which don't have Age column information (we already did this in a previous example)

In [49]:
age_buckets = bucketize(students_with_age, "Age")
dict(sorted({age:len(students) for age, students in age_buckets.items()}.items(), key = \
            lambda a: a[0]))

{17: 22,
 18: 276,
 19: 275,
 20: 164,
 21: 103,
 22: 32,
 23: 14,
 24: 13,
 25: 10,
 26: 7,
 27: 1,
 28: 1,
 29: 2,
 30: 2,
 31: 1,
 32: 2,
 37: 2,
 41: 1,
 53: 1}

### Find whether 15 youngest students in the class are pet owners?

In [50]:
students_with_age = [student_dict for student_dict in transformed_data \
                     if student_dict["Age"] != None]
[student_dict["Pet preference"] for student_dict in sorted(students_with_age, \
                            key = lambda s_dict: s_dict["Age"])[:15]]

['dog',
 'dog',
 'neither',
 'dog',
 'dog',
 'dog',
 'dog',
 'neither',
 'dog',
 'dog',
 'dog',
 'cat',
 'dog',
 'dog',
 'dog']