**Outline for Friday, March 5**

Sets and Dictionaries

You will be able to:
 - Define a "data structure" and recognize three common data structures
 - Use the concept of a data structure to organize your thinking about lists
 - Write code using a **set**
 - Write code using a **dictionary**
 
Definitions
 - Data Structure
 - Dictionary
 - Key-Value pair

**Data Structures**

A _data structure_ is a collection of *values*, the *relationships* among them, and the functions or *operations* that can be applied to the data.

| | values | relationships | operations |
| :- | :- | :- | :- |
| list | anything | ordered (indexes 0, 1, ...) | len(), indexing, pop(), slicing, interation (for), ... |
| set | anything (BUT no repeats) | no ordering | in, == |
| dict | key-value pairs (almost anything) | no ordering, BUT lookup values by their keys | keys, values, len(), lookup, insertion, deletion |
| ... | | | |

**Motivation for data structures**

Compare to our motivation for loops:
 - Avoid copy/pasted code
 - Don't always know in advance how many times to repeat
 
Data structure:
 - Avoid creating many similar (redundant) variables
 - Don't always know in advance how many values you will have

**Sets**

Unordered collection of items, with no repeated items!

In [4]:
e = {1,4,"five",1,1} #create a set
f = set() #create an empty set
print("five" in e) #Test membership

True


**Dictionaries**

Map (relatively simple) elements called "keys"
to other elements called "values"

Together: key-value pairs

Access values by their associated keys (this looks like indexing, except we supply the key instead of an index)

In [8]:
nums_dict = {"first":900, "third":500, 2:"600"}
print(nums_dict["first"])
print(nums_dict[2])
print(nums_dict[0]) #Produces a KeyError because 0 is not a key in the dictionary

900
600


KeyError: 0

**Parentheses, Brackets, Braces**

Small differences in code can make a big difference in effect.

Parentheses (x):
- Specify order: (1+2)*3
- Invoke a function: f(x)

Brackets \[x\]: (Often called "square brackets")
- Create a list: s = \[1,2,3,4\]
- Index into a sequence: s\[2\]
- Slice a sequence: s\[1:3\]
- Lookup in a dict: d\["a"\]

Braces {x}: (Often called "curly braces", sometimes called "curly brackets".)
- Create a dict: d = {"a":1, "b":2, "c":5}
- Create a set: e = {1,1,2,3}

**Reference: Creating empty sets, lists, and dictionaries**

Empty list:
- s = list()
- s = \[\]

Empty set:
- e = set()

Empty dict:
- d = dict()
- d = {}

In [11]:
#Dictionary Insert

d = {0:"zero", 10:"ten"}
d[20] = "twenty" #Similar to the syntax for a lookup
print(d)

#Dictionary Delete
print(d.pop(10)) #Delete by key, returns the value
print(d)

#Dictionary Update
d[20] = "TWENTY"
print(d)

{0: 'zero', 10: 'ten', 20: 'twenty'}
ten
{0: 'zero', 20: 'twenty'}
{0: 'zero', 20: 'TWENTY'}


In [13]:
import csv

#copied from https://automatetheboringstuff.com/2e/chapter16
def process_csv(filename):
    exampleFile = open(filename, encoding="utf-8")
    exampleReader = csv.reader(exampleFile)
    exampleData = list(exampleReader)
    exampleFile.close()
    return exampleData

In [22]:
#Task: How many tornadoes occurred each year?
#See tornados.csv

tornado_data = process_csv("tornados.csv")
tornado_data

years_counts = {}
for t in tornado_data[1:]:
    #t is each tornado entry in turn
    #use years_counts to keep count of tornados in each year
    year = t[0]
    if year in years_counts: #tests if year is a valid key
        years_counts[year] += 1
    else:
        years_counts[year] = 1
    
print(list(years_counts.keys())) #list of all the keys in the dictionary
print(list(years_counts.values())) #list of all the values in the dictionary
#DO NOT RELY ON ORDERING

for key in years_counts:
    if key.startswith("2"):
        print(key,years_counts[key])

['2006', '1996', '2016', '2014', '2015', '2005', '2002', '1995', '1997', '2001', '2011', '2010', '2017', '2008', '2003', '2004', '2013', '2009', '1998', '2007']
[1, 5, 4, 4, 2, 4, 5, 5, 2, 3, 2, 2, 4, 3, 2, 2, 2, 3, 1, 2]
2006 1
2016 4
2014 4
2015 2
2005 4
2002 5
2001 3
2011 2
2010 2
2017 4
2008 3
2003 2
2004 2
2013 2
2009 3
2007 2


**Challenge Problem**

Count how often each word appears in the book "The Wizard of Oz". (You will need to use oz.py in today's Code link. You don't need to understand how this module works, but you will need to experiment to figure out what type of values it returns!)

Sidenote: Once you can "read" a text into dictionary form, there's all sorts of things you can do. One thing that I look at in my research is what I call "topic-relevant" words. A topic-relevant word for a chapter appears much more often in that chapter than in the book as a whole. Can you write a program that predicts what each chapter is about? What surprises you about the words your program guesses?