**Outline for Monday, March 8**

More Dictionaries

You will be able to:
 - Select the appropriate data structure for a situation
 - Use the pop() and get() methods with default values
 - "Bucket" data using lists inside a dictionary
 - Simulate a table with a list of dictionaries

Useful methods
 - .get()
 - .pop()
 - .keys()
 - .values()

In [8]:
#Task: Produce suffixes for numbers (note that "th" is the default and is not included!)
suffix = {1:"st", 2:'nd', 3:"rd"}
#suffix.pop(5) #KeyError
#suffix[4] #KeyError
print(suffix.get(4,"th")) #get(key, default_value)
print(suffix.get(1,"th"))
print(suffix.pop(0,"th")) #pop(key, default_value) <-- will remove the matching key-value pair if it exists,
print(suffix)             #    and otherwise will JUST return the default value
print(suffix.pop(2,"th"))
print(suffix)

th
st
th
{1: 'st', 2: 'nd', 3: 'rd'}
nd
{1: 'st', 3: 'rd'}


In [11]:
from collections import defaultdict

#Defaultdict is a separate type of dictionary
# Restriction: Have to specify ONE type of value for it to hold
# Benefit: If a key is not present, defaultdict assumes that key has the default value for that value type

x = defaultdict(int)
print(x["hi"]) #Even though there is no key "hi", it has the default int value of 0 when we lookup it

#Use case: The tornado years_counts function we wrote on Friday can be rewritten more elegantly using defaultdict

0

**Data Structures**

A _data structure_ is a collection of *values*, the *relationships* among them, and the functions or *operations* that can be applied to the data.

| | values | relationships | operations |
| :- | :- | :- | :- |
| list | anything | ordered (indexes 0, 1, ...) | len(), indexing, pop(), slicing, interation (for), ... |
| set | anything (BUT no repeats) | no ordering | in, == |
| dict | key-value pairs (almost anything) | no ordering, BUT lookup values by their keys | keys(), values(), len(), lookup, insertion, deletion |
| ... | | | |

**Choosing the right data structure**


 - grocery list (can of tomatoes, 2 peppers, milk)
  - Probably a list? Maybe a set? (How do we want to handle the possibility of repeat items?)
 - player scores in a game (Alexi has 5, Meena has 8, Andy has 20)
  - dict (Notice the natural key-value pairs with name and score)
 - tornado counts by year
  - dict (Key-value pairs again! with year and the count)
 - tornado names used
  - set (If there have been 5 tornados "bob", I still only care that bob is IN my used_names set)
 - tornado entries by year
  - dict where each value is list of lists (dict maps year to a list of tornado entries - each tornado entry is a list)

**Bucketing (also known as "binning")**

What is it?
 - Take data (initially in a big list)
 - Send each data entry to a list inside a value in a dict
 - Distribute by some category within the data itself

Why bucket data?
 - A way to organize our data, without losing information in the process

How is this different from what we did on Friday?
 - The "without losing information in the process" part!

In [13]:
import csv

#copied from https://automatetheboringstuff.com/2e/chapter16
def process_csv(filename):
    exampleFile = open(filename, encoding="utf-8")
    exampleReader = csv.reader(exampleFile)
    exampleData = list(exampleReader)
    exampleFile.close()
    return exampleData

**Last time...**

We used tornados.csv to make a dictionary that counted the tornados in each month

In [None]:
#See tornados.csv
tornado_data = process_csv("tornados.csv")
tornado_data

years_counts = {}
for t in tornado_data[1:]:
    #t is each tornado entry in turn
    #use years_counts to keep count of tornados in each year
    year = t[0]
    if year in years_counts: #tests if year is a valid key
        years_counts[year] += 1
    else:
        years_counts[year] = 1
    
print(list(years_counts.keys())) #list of all the keys in the dictionary
print(list(years_counts.values())) #list of all the values in the dictionary
#DO NOT RELY ON ORDERING

for key in years_counts:
    if key.startswith("2"):
        print(key,years_counts[key])

**Today: Bucket tornado data**

Write a function that buckets the data by a given column name.

In [16]:
tornado_data = process_csv("tornados.csv")
headers = tornado_data[0]
tornado_data = tornado_data[1:]
print(headers)

def bucket_tornados(tornado_data,headers,col):
    """Bucket the tornado data by the given column and return the resulting dict"""
    d = {}
    col_index = headers.index(col)
    for tornado in tornado_data:
        col_value = tornado[col_index]
        if col_value not in d:
            d[col_value] = [tornado] #Creates a new list for the new bucket, with one element (tornado) in that list
        else:
            d[col_value].append(tornado)
    return d

bucket_tornados(tornado_data, headers, "location")

['year', 'id', 'location', 'speed']


{'site B': [['2006', 'QPIQPWDP', 'site B', '175'],
  ['1996', 'MMMHKDDK', 'site B', '290'],
  ['2016', 'QSCAPJBU', 'site B', '290'],
  ['2002', 'EYIVKEWL', 'site B', '199'],
  ['1997', 'NSCJTEAU', 'site B', '222'],
  ['2011', 'AUJLPQWN', 'site B', '243'],
  ['2010', 'SZRUOPIH', 'site B', '201'],
  ['2008', 'FUBXASWR', 'site B', '255'],
  ['2003', 'VFIBJORY', 'site B', '240'],
  ['2001', 'VTBWHKRH', 'site B', '271'],
  ['2009', 'VLEHBLKH', 'site B', '141'],
  ['2010', 'CETAQQXF', 'site B', '202'],
  ['2017', 'YWHUCOUS', 'site B', '194'],
  ['2015', 'YDPQWWGV', 'site B', '130'],
  ['1995', 'EMCQGXEG', 'site B', '155']],
 'site C': [['2014', 'KKGOICYZ', 'site C', '122'],
  ['1996', 'JRHLYGLS', 'site C', '238'],
  ['2001', 'QRYMLENE', 'site C', '174'],
  ['2002', 'IZXLGNRJ', 'site C', '269'],
  ['2005', 'CWGFYTZZ', 'site C', '109'],
  ['2017', 'LIBFCJBB', 'site C', '181'],
  ['2017', 'CNMZERWF', 'site C', '218'],
  ['2003', 'NQSHEURP', 'site C', '155'],
  ['2004', 'KRMJSZGY', 'site C', '13

**A different way of organizing data**

Process a CSV file into a _list_ of _dictionaries_.

In [18]:
data_as_list = process_csv("tornados.csv")
header = data_as_list[0]
data_as_list = data_as_list[1:]
data_as_list_of_dicts = []
for tornado in data_as_list:
    new_dict = {}
    data_as_list_of_dicts.append(new_dict)
    for index in range(len(header)):
        new_dict[header[index]] = tornado[index]
data_as_list_of_dicts

[{'year': '2006', 'id': 'QPIQPWDP', 'location': 'site B', 'speed': '175'},
 {'year': '1996', 'id': 'MMMHKDDK', 'location': 'site B', 'speed': '290'},
 {'year': '2016', 'id': 'QSCAPJBU', 'location': 'site B', 'speed': '290'},
 {'year': '2014', 'id': 'KKGOICYZ', 'location': 'site C', 'speed': '122'},
 {'year': '2015', 'id': 'ZDMHZTXL', 'location': 'site A', 'speed': '147'},
 {'year': '2005', 'id': 'FEBIJZIF', 'location': 'site A', 'speed': '198'},
 {'year': '2002', 'id': 'EYIVKEWL', 'location': 'site B', 'speed': '199'},
 {'year': '1995', 'id': 'JDUTRHFQ', 'location': 'site A', 'speed': '281'},
 {'year': '1997', 'id': 'NSCJTEAU', 'location': 'site B', 'speed': '222'},
 {'year': '2005', 'id': 'AWLDIUCW', 'location': 'site A', 'speed': '173'},
 {'year': '1996', 'id': 'JRHLYGLS', 'location': 'site C', 'speed': '238'},
 {'year': '2001', 'id': 'QRYMLENE', 'location': 'site C', 'speed': '174'},
 {'year': '1995', 'id': 'RCAOONFD', 'location': 'site A', 'speed': '198'},
 {'year': '2002', 'id': '

**Challenge Problem**

Bucketing can be used to help sort a list. Write a function that takes a list of 3-digit numbers. It works by
 1. Bucketing by the 1s digit.
 2. "Flattening" the buckets into a single list (but now the list is "sorted" by 1s digit)
 3. Bucketing by the 10s digit.
 4. Flattening again
 5. Bucketing by the 100s digit.
 6. Flattening one last time
See if you can explain why this approach sorts the list. How does its speed compare to using the built-in list sort() method? (Remember `from time import time` to help you measure speed.)