# List Practice

In [1]:
import csv

In [2]:
# inspired by https://automatetheboringstuff.com/2e/chapter16/
def process_csv(filename):
    # open the file, its a text file utf-8
    example_file = open(filename, encoding="utf-8")
    # prepare it for reading as a CSV object
    example_reader = csv.reader(example_file)
    # use the built-in list function to convert this into a list of lists
    example_data = list(example_reader)
    # close the file to tidy up our workspace
    example_file.close()
    # return the list of lists
    
    return example_data

### Student Information Survey data

In [3]:
# TODO: call the process_csv function and store the list of lists in cs220_csv
cs220_csv = process_csv("cs220_survey_data.csv")

In [4]:
# Store the header row into cs220_header, using indexing
cs220_header = cs220_csv[0]
cs220_header

['Lecture',
 'Age',
 'Primary major',
 'Other majors',
 'Zip Code',
 'Pizza topping',
 'Pet owner',
 'Runner',
 'Sleep habit',
 'Procrastinator']

In [5]:
# TODO: Store all of the data rows into cs220_data, using slicing
cs220_data = cs220_csv[1:]

# TODO: use slicing to display top 3 rows data
cs220_data[:3]

[['LEC002',
  '19',
  'Engineering: Mechanical',
  '',
  '53711',
  'pepperoni',
  'Yes',
  'No',
  'night owl',
  'Maybe'],
 ['LEC002',
  '20',
  'Science: Physics',
  'Astronomy-Physics, History',
  '53726',
  'pineapple',
  'Yes',
  'Yes',
  'night owl',
  'Yes'],
 ['LEC001',
  '20',
  'Science: Chemistry',
  '',
  '53703',
  'pepperoni',
  'Yes',
  'No',
  'early bird',
  'No']]

### What is the Sleep habit for the 2nd student?

In [6]:
cs220_data[1][8] # bad example: we hard-coded the column index

'night owl'

What if we decided to add a new column before sleeping habit? Your code will no longer work.

Instead of hard-coding column index, you should use `index` method, to lookup column index from the header variable. This will also make your code so much readable.

In [7]:
cs220_data[1][cs220_header.index("Sleep habit")]

'night owl'

### What is the Lecture of the 4th student?

In [8]:
cs220_data[3][cs220_header.index("Lecture")]

'LEC004'

### Create a list containing Age of all students 10 years from now

In [9]:
ages_in_ten_years = []

for row in cs220_data:
    age = row[cs220_header.index("Age")]
    
    if age == '':
        continue
        
    age = int(age)
    ages_in_ten_years.append(age + 10)
    
ages_in_ten_years[:3]

[29, 30, 30]

### cell function

- It would be very helpful to define a cell function, which can handle missing data and type conversions

In [10]:
def cell(row_idx, col_name):
    """
    Returns the data value (cell) corresponding to the row index and 
    the column name of a CSV file.
    """
    # TODO: get the index of col_name
    col_idx = cs220_header.index(col_name) 
    
    # TODO: get the value of cs220_data at the specified cell
    val = cs220_data[row_idx][col_idx]  
    
    # TODO: handle missing values, by returning None
    if val == '':
        return None
    
    # TODO: handle type conversions
    if col_name in ["Age",]:
        return int(val)
    
    return val

### Find average age per lecture.

In [11]:
# TODO: initialize 4 lists for the 4 lectures
lec1_ages = []
lec2_ages = []
lec3_ages = []
lec4_ages = []

# Iterate over the data and populate the lists

for row_idx in range(len(cs220_data)):
    age = cell(row_idx, "Age")
    
    if age != None:
        lecture = cell(row_idx, "Lecture")
        if lecture == "LEC001":
            lec1_ages.append(age)
        elif lecture == "LEC002":
            lec2_ages.append(age)
        elif lecture == "LEC003":
            lec3_ages.append(age)
        elif lecture == "LEC004":
            lec4_ages.append(age)    
            
# TODO: compute average age of each lecture
print("LEC001 average student age:", round(sum(lec1_ages) / len(lec1_ages), 2))
print("LEC002 average student age:", round(sum(lec2_ages) / len(lec2_ages), 2))
print("LEC003 average student age:", round(sum(lec3_ages) / len(lec3_ages), 2))
print("LEC004 average student age:", round(sum(lec4_ages) / len(lec4_ages), 2))

LEC001 average student age: 19.93
LEC002 average student age: 19.8
LEC003 average student age: 19.38
LEC004 average student age: 19.27


### `sort` method versus `sorted` function

- `sort` (and other list methods) have an impact on the original list
- `sorted` function returns a new list with expected ordering
- default sorting order is ascending / alphanumeric
- `reverse` parameter is applicable for both `sort` method and `sorted` function:
    - enables you to specify descending order by passing argument as `True`

In [12]:
some_list = [10, 4, 25, 2, -10] # TODO: Initialize some_list with a list of un-ordered integers

In [13]:
# TODO: Invoke sort method
rv = some_list.sort()
print(some_list)

# What does the sort method return? 
# TODO: Capture return value into a variable rv and print the return value.
print(rv)

[-10, 2, 4, 10, 25]
None


`sort` method returns `None` because it sorts the values in the original list

In [14]:
# TODO: invoke sorted function and pass some_list as argument
# TODO: capture return value into sorted_some_list
sorted_some_list = sorted(some_list)

# What does the sorted function return? It returns a brand new list with the values in sorted order
print(sorted_some_list)

[-10, 2, 4, 10, 25]


TODO: go back to `sort` method call and `sorted` function call and pass keyword argument `reverse = True`.

### set data structure

- **not a sequence**
- no ordering of values:
    - this implies that you can only store unique values within a `set`
- very helpful to find unique values stored in a `list`
    - easy to convert a `list` to `set` and vice-versa.
    - ordering is not guaranteed once we use `set`

In [15]:
some_set = {10, 20, 30, 30, 40, 50, 10} # use a pair of curly braces to define it
some_set

{10, 20, 30, 40, 50}

In [16]:
some_list = [10, 20, 30, 30, 40, 50, 10] # Initialize a list containing duplicate numbers

# TODO: to find unique values, convert it into a set
print(set(some_list))

# TODO: convert the set back into a list
print(list(set(some_list)))

{40, 10, 50, 20, 30}
[40, 10, 50, 20, 30]


Can you call `sort` method on a set?

In [17]:
# some_set.sort() 
# doesn't work: no method named sort associated with type set
# you cannot sort a set because of the lack of ordering

Can you pass a `set` as argument to `sorted` function? Python is intelligent :)

In [18]:
sorted(some_set) # works because Python converts the set into a list and then sorts the list

[10, 20, 30, 40, 50]

Can you index / slice into a `set`?

In [19]:
# some_set[1] # doesn't work - remember set has no order

In [20]:
# some_set[1:] # doesn't work - remember set has no order

### Find all unique zip codes. Arrange them based on ascending order.

In [21]:
# TODO: initialize list of keep track of zip codes
zip_codes = []

for row_idx in range(len(cs220_data)):
    zip_code = cell(row_idx, "Zip Code")
    
    if zip_code != None:
        zip_codes.append(zip_code)
        
zip_codes = list(set(zip_codes))
zip_codes.sort()
zip_codes

['10306',
 '19002',
 '43706',
 '5 3706',
 '52706',
 '52816',
 '53076',
 '53089',
 '53175',
 '53562',
 '53575',
 '53590',
 '53597',
 '53701',
 '53703',
 '53703-1104',
 '53704',
 '53705',
 '53706',
 '53706-1127',
 '53706-1188',
 '53706-1203',
 '53706-1406',
 '53708',
 '53711',
 '53713',
 '53715',
 '53717',
 '53719',
 '53726',
 '54636',
 '55416',
 '57305',
 '59301',
 '83001',
 '92376',
 'internation student']

### Arrange unique zip codes based on descending order.

In [22]:
sorted(zip_codes, reverse = True)

['internation student',
 '92376',
 '83001',
 '59301',
 '57305',
 '55416',
 '54636',
 '53726',
 '53719',
 '53717',
 '53715',
 '53713',
 '53711',
 '53708',
 '53706-1406',
 '53706-1203',
 '53706-1188',
 '53706-1127',
 '53706',
 '53705',
 '53704',
 '53703-1104',
 '53703',
 '53701',
 '53597',
 '53590',
 '53575',
 '53562',
 '53175',
 '53089',
 '53076',
 '52816',
 '52706',
 '5 3706',
 '43706',
 '19002',
 '10306']

## Self-practice

### How many students are both a procrastinator and a pet owner?

### What percentage of 18-year-olds have their major declared as "Other"?

### How old is the oldest basil/spinach-loving Business major?