## Lesson 5 Overview

1. List comprehensions
2. Sorting (ascending and descending order)
3. Slicing
4. Merging

## Let's load today's lesson!

### Open Azure Notebooks library 

Go to https://notebooks.azure.com -> Sign in if needed -> Select **python-codeacademy-sg**

### Update lesson file to latest version

Select **New** -> **From URL** -> input https://raw.githubusercontent.com/viettrung9012/python-codeacademy-sg/master/Lesson5.ipynb (URL is available in **Lesson5.ipynb**) -> Click outside input then select **Upload** (overwrite if needed)

### Open Jupyter lab

From your browser's bookmark or **Run** -> Change browser URL path from **/nb/tree** to **/nb/lab**

Select **Lesson5.ipynb**

In lesson 2, you've learnt about List and how to create and manipulate them.
In lesson 3, you've learnt about Loops and how to create and update Lists using Loops.
In lesson 5, you will learn advanced List creation and manipulation methods in order to complete more complex tasks, and List comprehensions is one of them.

# List comprehensions

`Estimation: 10 minutes`

List comprehensions provide a concise way to create lists.

It consists of brackets containing an expression followed by a `for` clause, then zero or more `for` or `if` clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists.

The result will be a new list resulting from evaluating the expressions in the context of the `for` and `if` clauses which follow it.

The list comprehension always returns a result list.

### Syntax

The list comprehension starts with a '[' and end with ']' and this will ensure the end result is going to be a list.

**newList = [expression(x) for x in oldList if filter(x)]**

**newList** is the new list result.

**expression(x)** is the expression based on the variable used for each element in the old list.

**for x in oldList** is a `for` loop.

**if filter(x)** is a if-statement to filter unwanted results.

Example 1: Create a list of squares for even numbers from 0 to 9

In [None]:
# Normal loop implementation
evenSquares = []
for x in range(10):
    if x % 2 == 0:
        evenSquares.append(x**2)
        
print(evenSquares)

In [None]:
# List comprehensions implementation
[x**2 for x in range(10) if x % 2 == 0]

Example 2: Create a tuple that have all different numbers combination between 2 array range from 0 to 4.

In [None]:
# Normal loop implementation
combinations = []
for x in range(5):
    for y in range(5):
        if (x != y):
            combinations.append((x, y))
            
print(combinations)

In [None]:
# List comprehensions implementation
combinations = [(x, y) for x in range(5) for y in range(5) if x != y]
print(combinations)

Example 3: Given 2-dimentional list (like a table), flatten it (to a normal list)

In [None]:
table = [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]

# Normal loop implementation
flatten = []
for row in table:
    for num in row:
        flatten.append(num)

print(flatten)

In [None]:
# List comprehensions implementation
[num for row in table for num in row]

Example 4: Given 2-dimentional list (like a table), transpose rows and columns using Nested List Comprehensions

In [None]:
# Normal loop implementation
transpose = []
for num in range(len(table[0])):
    fields = []
    for row in table:
        fields.append(row[num])
    transpose.append(fields)

print(transpose)

In [None]:
# List comprehensions implementation
[[row[num] for row in table] for num in range(len(table[0]))]

Exercise: Create a tuple that have number from 2 to 5 (x) with number divisible by x between 1 and 25. `(5 minutes)`

In [None]:
# Normal loop implementation


# List comprehensions implementation


# Expected result:
# [(2, 2), (2, 4), (2, 6), (2, 8), (2, 10), (2, 12), (2, 14), (2, 16), 
#  (2, 18), (2, 20), (2, 22), (2, 24), (3, 3), (3, 6), (3, 9), (3, 12), 
#  (3, 15), (3, 18), (3, 21), (3, 24), (4, 4), (4, 8), (4, 12), (4, 16), 
#  (4, 20), (4, 24), (5, 5), (5, 10), (5, 15), (5, 20), (5, 25)]

# Sorting

`Estimation: 5 minutes`

### Ascending - from small to big, from low to high, from A to Z, from a to z

By default, all sorting are in ascending order.

**sorted(list)** will return a new sorted list, leaving the original list unaffected.

**list.sort()** will sort the list **in-place**. If the list is immutable, it will return None.

### Descending - in reverse of Ascending

Both sort() and sorted() accept an additional parameter for reverse in boolean type.

**sorted(list, reverse=True)**

**list.sort(reverse=True)**

Example 1: Given a list of numbers, create a new sorted list in ascending order and in-place descending order.

In [None]:
listOfNum = [56, 34, 65, 12, 88, 54, 99, 78]

# new ascending order list named ascending
ascending = sorted(listOfNum)

# in-place descending order
listOfNum.sort(reverse=True)

print("Ascending: ", ascending)
print("Descending: ", listOfNum)

Example 2: Given a list of alphabets, create a new sorted list in descending order and in-place ascending order.

In [None]:
listOfAlp = ['g', 'w', 's', 'a', 'k', 'e', 'q', 'd']

# new descending order list named descending
descending = sorted(listOfAlp, reverse=True)

# in-place ascending order
listOfAlp.sort()

print("Ascending: ", listOfAlp)
print("Descending: ", descending)

Exercise: Flatten list given and sort by ascending order `(5 minutes)`

In [None]:
# list given
listOfNum = [76, 23, 54, 68]
listOfFloat = [54.1, 53.9, 54.0]
listOfAlp = ['e', 'x', 'p']
listOfCapAlp = ['E', 'D', 'I', 'A']
combined = [listOfNum, listOfFloat, listOfAlp, listOfCapAlp]

# flatten array


# sort by ascending order


# Expected result:
# ['23', '53.9', '54', '54.0', '54.1', '68', '76', 'A', 'D', 'E', 'I', 'e', 'p', 'x']

# Slicing

`Estimation: 5 minutes`

Slicing is used to extract a part of a List, a Tuple or a String.

### Syntax

**list[start:end:step]**

**start** by default is 0

**end** by default is the last list item

**step** by default is 1 

Example 1: Given a list, slice list with default step.

In [None]:
items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# all items in list
print("Initial list: ", items)

# first and last item in the array/list
print("First item in list: ", items[0])
print("Last item in list: ", items[-1])

# all item except the last two items
print("All item except last two items in list: ", items[:-2])

# items from 3rd to 5th array/list
print("3rd to 5th items in list: ", items[2:5])

Example 2: Slice list with different step.

In [None]:
# all items in the array/list reversed
print("All reversed: ", items[::-1])

# first 3 items reversed
print("First 3 items reversed: ", items[2::-1])

# last 3 items reversed
print("Last 3 items reversed: ", items[:-4:-1])

# all items in even position
print("All items in even position: ", items[::2])


Exercise: Clean list and print "Expedia" `(5 minutes)`

In [None]:
listOfExpedia = ['(', '$', 'a', 'i', 'd', 'e', 'p', 'x', 'E', ')']

# hint:
# list convert to string, use: ''.join(list)



# Merging

`Estimation: 1 minutes`

Merging newList into existingList.

### Syntax

**existingList.extend(newList)**

Example 1: Given 2 list, merge them.

In [None]:
existingFruits = ["Apple", "Banana", "Mango"]
newFruits = ["Dragon Fruits", "Kiwi"]

existingFruits.extend(newFruits)
print("Merge new fruits into existing list: ", existingFruits)

Let's work with Biggest Losser data! `(15 minutes)`

In [None]:
# Prerequisite: these functions are from previous lesson, run them.
def total_steps_by_row(row, header):
    total_steps = 0
    for header_index, fieldNames in enumerate(header):
        steps_in_column = row[header[header_index]]
        if header_index > 3 and steps_in_column != '':
            total_steps = total_steps + int(steps_in_column)
    return total_steps


def get_team_names(steps_data):
    team_names = []
    for row in steps_data:
        if row['team_name'] not in team_names:
            team_names.append(row['team_name'])
    return team_names


def get_total_steps_by_team_name_from_data(steps_data, header, team_name):
    team_total_steps = 0
    for row in steps_data:
        if row['team_name'] == team_name:
            team_total_steps = team_total_steps + total_steps_by_row(row, header)
    return team_total_steps

In [None]:
# Top 3 steps count, top teams
def total_steps_each_team(header, steps_data, team_names):
    return [[get_total_steps_by_team_name_from_data(steps_data, header, name), name] for name in team_names]


import csv


with open('Biggest Loser 2018.csv') as csvfile:
    readCSV = csv.DictReader(csvfile)
    header = readCSV.fieldnames
    steps_data = list(readCSV)
    team_names = get_team_names(steps_data)
    
    # retrieve total steps for each team
    total_steps_by_team = total_steps_each_team(header, steps_data, team_names)
    print("Total steps for all teams: ", total_steps_by_team)
    
    # sort total steps by team from highest to lowest
    sorted_teams_by_total_steps = sorted(total_steps_by_team, reverse=True)
    print("Sorted teams from highest to lowest number of steps: ", sorted_teams_by_total_steps)
    
    # slice list to top 3 teams
    top_3_teams = sorted_teams_by_total_steps[:3]
    print("Total steps for top 3 teams: ", top_3_teams)

    for key, row in enumerate(top_3_teams):
        print("Number " + str(key+1) + ": " + row[1] + " (" + str(row[0]) + ")")

In [None]:
# Identify first team to reach 1mil steps? and on which n-th day?
# Which day to reach 1mil steps per team?


# Get total daily steps by team
def get_daily_steps_by_team_name(steps_data, column, team_name):
    team_total_steps = 0
    for row in steps_data:
        if row['team_name'] == team_name:
            if len(row[column]) > 0:
                team_total_steps = team_total_steps + int(row[column])
    return team_total_steps


# total team steps for all days
# output example:
#   ['TBD', [['2018-04-02', 75546], ['2018-04-03', 71297], ['2018-04-04', 86317], ...,
#   ['Sole Striders', [['2018-04-02', 38337], ['2018-04-03', 47547], ['2018-04-04', 38161], ...,
def total_team_steps_for_all_days(steps_data, header, team_names):
    total_daily_steps = []
    # loop through team_names to get total faily steps for each team
    for team_name in team_names:
        total_steps_per_day = []
        # this is to skip first 4 columns. Steps number only starts on 5th column onwards.
        date_columns = header[4:]
        for column in date_columns:
            # store date and team daily steps in total_steps_per_day
            total_steps_per_day.append([column, get_daily_steps_by_team_name(steps_data, column, team_name)])
        # store team name and team daily steps with date in total_daily_steps
        total_daily_steps.append([team_name, total_steps_per_day])
    return total_daily_steps


# consolidate teams that have reached total of 1 million steps
# output example:
#   [('2018-04-13', 'TBD', 1028885), ('2018-04-23', 'FuFu', 1002373), 
#    ('2018-04-19', 'Tears for Beers', 1011356), ('2018-04-16', 'Stop when you drop!!', 1034537), 
#    ('2018-04-23', 'Cereal Killers', 1019483), ('2018-04-24', 'Here come the hotsteppers', 1022037)...]
def all_team_date_to_reach_1mil_steps(total_team_steps_for_all_days):
    total_daily_steps_incremental = []
    # loop through all teams
    for team in total_team_steps_for_all_days:
        team_name = team[0]
        team_daily_steps = team[1]
        total_team_steps = 0
        # loop through daily steps
        for daily_steps in team_daily_steps:
            date = daily_steps[0]
            total_team_steps = total_team_steps + daily_steps[1]
            # check when total team steps achieved more than 1 million steps
            if (total_team_steps >= 1000000):
                # store date, team name and total steps into list
                total_daily_steps_incremental.append((date, team_name, total_team_steps))
                # break is to go out or stop the loop to process subsequent days
                break
    return total_daily_steps_incremental


def first_team_reached_1mil_steps(total_team_steps_for_all_days):
    # retrieve teams that had achieved 1 million steps
    date_of_teams_reached_1mil_steps = all_team_date_to_reach_1mil_steps(total_team_steps_for_all_days)
    # sort list according to date earliest to latest
    first_team_reached_1mil_steps = sorted(date_of_teams_reached_1mil_steps)
    # return the first results in the sorted list
    return first_team_reached_1mil_steps[0]


with open('Biggest Loser 2018.csv') as csvfile:
    readCSV = csv.DictReader(csvfile)
    header = readCSV.fieldnames
    steps_data = list(readCSV)
    team_names = get_team_names(steps_data)
    total_team_steps_for_all_days = total_team_steps_for_all_days(steps_data, header, team_names)
    first_team_reached_1mil_steps = first_team_reached_1mil_steps(total_team_steps_for_all_days)
    print("First team to reach 1 million steps: " + first_team_reached_1mil_steps[1] + " on " + first_team_reached_1mil_steps[0] + " with " + str(first_team_reached_1mil_steps[2]) + " steps!")

    print("Date of each team reached 1 million steps:")
    for team in all_team_date_to_reach_1mil_steps(total_team_steps_for_all_days):
        print("Team: " + team[1] + "(" + str(team[2]) + ") on " + team[0])


## Homework (optional)

In [None]:
# Most active day (date of day with most steps in total)

# hint: 
# 1) create a function to calculate total steps a day
# 2) loop through each day and get the highest number of steps

