# Lesson 2-Data structures

## I. Lists and For-Loops
**Roughly corresponds to "Unit 5 Python Lists and dictionaries" from Code Academy**

So far, we've only dealt with individual variables. But we might want to keep track of a set of values. We can use lists to store variables and call upon them later.

In [2]:
names=['Winston','Jess','Nick','Schmidt']
heights=[5.0,5.5,6.0,5.8]
weights=[125.0,160.0,168.0,140.0]

print "All the heights"
print heights

print "1st person's name"
print names[0] # note, indices start from 0

print "2nd person's weight"
print weights[1] 

print "Last person's weight"
print weights[-1] 

print "1st-3rd person's weight"
print weights[0:3] # stops right before index 3

All the heights
[5.0, 5.5, 6.0, 5.8]
1st person's name
Winston
2nd person's weight
160.0
Last person's weight
140.0
1st-3rd person's weight
[125.0, 160.0, 168.0]


And we can change elements of the list

In [3]:
weights[1]=165.0
print weights

[125.0, 165.0, 168.0, 140.0]


But accessing each of these variables individually is annoying, especially when lists get long. For-loops are an easy way to access all the elements

In [7]:
for name in names:
    print name+' is a character on New Girl'

Winston is a character on New Girl
Jess is a character on New Girl
Nick is a character on New Girl
Schmidt is a character on New Girl


In [6]:
name=names[0]
print name
name=names[1]
print name
name=names[2]
print name
name=names[3]
print name

Winstonis a character on New Girl
Jess
Nick
Schmidt


And we can also incorporate logical statements to filter out certain results

In [4]:
for wt in weights:
    if wt>150:
        print wt

165.0
168.0


We also might want to store these values in a new list

In [5]:
greater_150_wts=[] # Create an empty list
for wt in weights:
    if wt>150:
        greater_150_wts.append(wt) # append adds the given element to the list

print greater_150_wts
print greater_150_wts[1]

[165.0, 168.0]
168.0


## Useful list functions

We can combine two lists using +

In [6]:
print ['a','b']+['c','d']

['a', 'b', 'c', 'd']


We also might want to know HOW many people are of a certain weight range

In [7]:
print len(greater_150_wts)

2


Using the len() and range() function are also useful ways to get out elements from a list (if the positions of elements in the list are meaningful).

In [8]:
print 'Range'
print range(6)
print range(2,6)

print 'Range'
print range(len(weights))

Range
[0, 1, 2, 3, 4, 5]
[2, 3, 4, 5]
Range
[0, 1, 2, 3]


And we can use the output of range to get each index from the list

In [9]:
for i in range(len(weights)):
    print weights[i]

125.0
165.0
168.0
140.0


sorted() will order a list

In [10]:
num_list=[2,3,4,1,3,5]
print sorted(num_list)

char_list=['b','a','c','e','d','c']
print sorted(char_list)

[1, 2, 3, 3, 4, 5]
['a', 'b', 'c', 'c', 'd', 'e']


Using "in" can tell us whether something is in a list

In [11]:
print 4 in [1,2,3,4]
print 4 in [1,2,3]

True
False


In [12]:
print 'a' in ['a','b','c','d']
print 'a' in ['ab','c','d'] # note this compares 'a'=='ab', not 'a'=='a' and 'a'=='b'

True
False


And this is another way to do for loops on lists called list comprehensions. It's a little more complex but is nice and compact

In [13]:
a=range(4)
print a
print [a2*a2 for a2 in a]
print [str(a2)+str(a2) for a2 in a]

[0, 1, 2, 3]
[0, 1, 4, 9]
['00', '11', '22', '33']


This compresses all the parts of a for loop into a single line. If you have more complex operations, you might want to use a regular for loop, or create a function with all the stuff in it and then use that in the list comprehension.

## tuples

Briefly, tuples are like lists, but you can't change their contents. This can be useful is you need some constant values (like race and genders or something like that). They also make good keys in dictionaries

In [14]:
tup=(1,2,3) # note, tuples use parentheses instead of brackets
print tup[0]
tup[0]=2

1


TypeError: 'tuple' object does not support item assignment

# II. Dictionaries

Dictionaries build upon lists and let us add intuitive names for list elements

In [8]:
people={'Winston':[5], 'Jess':[5.5],'Nick':[6.0]}
print people

{'Jess': [5.5], 'Winston': [5], 'Nick': [6.0]}


And I can then call upon their properties by their names

In [9]:
people['Jess']

[5.5]

And we can add and change values, just like we did with lists

In [10]:
people['Jess']=[5.4]
people

{'Jess': [5.4], 'Nick': [6.0], 'Winston': [5]}

In [11]:
people['Schmidt']=[5.8]
people

{'Jess': [5.4], 'Nick': [6.0], 'Schmidt': [5.8], 'Winston': [5]}

Dictionaries are usually described in terms of keys and values. keys are the words in quotation marks (left of the colons) that we use to call upon values (right of the colons). Note that keys have to be unique 

In [12]:
people={'Winston':[5], 'Jess':[5.5],'Nick':[6.0],'Winston':[4.3]}
people['Winston'] # only one of these values will be kept

[4.3]

Python also gives us some useful functions for easily getting keys and values

In [15]:
sorted(people.keys())

['Jess', 'Nick', 'Winston']

In [None]:
people.values()

In [None]:
for k in people.keys():
    if len(k)<5:
        print people[k]

# III. Strings

We've been using strings here and there. They're going to be very important as we start using more dictionaries and eventually more complex data structures. Here's a brief summary of methods that might be of use. Pretty straight forward stuff

In [None]:
tim_str='Timothy Franklin Lew'

In [None]:
print tim_str.lower()

In [None]:
print tim_str.upper()

In [None]:
tim_descr=' is da best'
print tim_str+tim_descr

Strings are actually lists of characters so you can actually index them and get their length

In [None]:
print tim_str[0] # get first letter
print len(tim_str) # get length of string

And as I mentioned previously, if we want to turn something into a string, we can use the str() function

In [None]:
age=25
print tim_str+' is '+str(age)

Doing something like recognizing a certain string pattern is a bit more complicated. I won't go into it fully here, but you have to use something called regex (regular expressions). regex functions take in a string and a string pattern and applies the pattern to the string.

In [None]:
import re # this is a built in python regex package

sample_text='1990Bob1989Alice1991Olivia'

# [a-zA-Z] says I want any characters from a-z (lowercase) or A-Z (uppercase). 
# The * says it can repeat any number of times
str_pattern='[a-zA-Z]*' 

re.split(str_pattern,sample_text)

# Outline example-Added 5/3/2016

Last time, we covered a lot of stuff very rapidly, so I wanted to take some time to discuss how we might do a programming problem. Let's say, based on the information below, I want to find out what proportion of people's experiments are successes. Here, I want the output to be a dictionary mapping people's names onto the proportion of their experiments that were successes.

In [21]:
names=['tim','drew','kristin']
succ_exp=[4,5,7]
fail_exp=[8,7,12]

def succ_proportion(names,succ_exp,fail_exp):
    succ_prop={}
    # your code in here
    return succ_prop

Before I actually mess around with the function, I might start by seeing if I can get the right kind of answer outside of a function: Can I get each person's proportion of successes? Well, a proportion has two parts--a numerator and denominator. I have my numerator (succ_exp, the number of successes). How can I get my denominator?

In [22]:
for i in range(len(succ_exp)):
    print i
    print succ_exp[i]+fail_exp[i]

0
12
1
12
2
19


I can then use each of these with the denominator to get a proportion

In [23]:
for i in range(len(succ_exp)):
    sum_exp=succ_exp[i]+fail_exp[i]
    print succ_exp[i]/sum_exp

0
0
0


Uh oh, need to make sure we have floats

In [24]:
for i in range(len(succ_exp)):
    sum_exp=succ_exp[i]+fail_exp[i]
    print succ_exp[i]/float(sum_exp)

0.333333333333
0.416666666667
0.368421052632


And now we can try putting this into a function

In [25]:
def succ_proportion(names,succ_exp,fail_exp):
    succ_prop={}
    # your code in here
    for i in range(len(succ_exp)):
        sum_exp=succ_exp[i]+fail_exp[i]
        print succ_exp[i]/float(sum_exp)
    
    return succ_prop
succ_proportion(names,succ_exp,fail_exp)

0.333333333333
0.416666666667
0.368421052632


{}

But this isn't giving us what we want. The function is calculating the success proportion--we can see it right there--but its output is still an empty dictionary. So we need to make sure that we store the success proportions into our dictionary

In [26]:
def succ_proportion(names,succ_exp,fail_exp):
    succ_prop={}
    # your code in here
    for i in range(len(succ_exp)):
        sum_exp=succ_exp[i]+fail_exp[i]
        prop=succ_exp[i]/float(sum_exp)
        person_name=names[i]
        succ_prop[person_name]=prop
    
    return succ_prop
succ_proportion(names,succ_exp,fail_exp)

{'drew': 0.4166666666666667,
 'kristin': 0.3684210526315789,
 'tim': 0.3333333333333333}

# Problem

I'm trying to characterize how my labmates talk, so I store some of their most frequent words in a dictionary. Who uses the longest words? Create a function that creates a new dictionary that maps our names (tim, drew, kristin) onto the average length of our common words

In [27]:
tim_words=['excellent','awesome','great','chipper']
drew_words=['banjo','bag','go','for','walk']
kristin_words=['gluten','instagram','filter']
words={'tim':tim_words,'drew':drew_words,'kristin':kristin_words}

def long_words(words):
    len_words={}
    # your code in here
    return len_words

print long_words(words)

{}
