In [1]:
# import our standard set of packages

import urllib
import pandas as pd
import numpy as np
import seaborn 
import matplotlib.pyplot as plt

# 5-6: Lists, Sets and DataFrames

Most of the additional data types in Python are constructed by combinations of the numbers and strings we saw in our last class. Three important ones are 

- lists (which you should think of as sequences of numbers or strings), 
- sets (which are like lists except that order does not matter and duplicates are not allowed), and
- DataFrames.

DataFrames are pandas objects specifically for working with data. You should think of them as two dimensional lists.


## Lists

Lists are a data type that is built from others. They will take any other Python object in their parts, and you can even mix objects (though you should have a very good reason for doing so). For lists the order matters and the same object can appear in multiple places

In [2]:
x = [2, 3, 4, 5]
x

[2, 3, 4, 5]

In [7]:
y = ['dog', 'cat', 'pig', 'cow']
y

['dog', 'cat', 'pig', 'cow']

The elements of a list can be refered to individually. Because we do this so often, Python has some tricks which I will show a few of here.

**Note the important convention that 0 is the first element of a list**

In [8]:
# first element of x

x[0]

2

In [9]:
# last element of y

y[-1]

'cow'

In [11]:
# first two elements of x, note that the 0 can be left off (try it)

x[0:2]

[2, 3]

In [13]:
# last three elements of y, note Python knows where the list ends.

y[-3:]

['cat', 'pig', 'cow']

In [14]:
# the 2nd and third element of y

y[1:3]

['cat', 'pig']

Strings are actually lists of characters with one important difference:

In [19]:
full_name = 'Melville, Herman'
full_name[0]

'M'

Lists are mutable. You can change one of the entries and it updates the list:

In [20]:
x

[2, 3, 4, 5]

In [21]:
x[0]=1
x

[1, 3, 4, 5]

However that feature has been turned off for strings

In [22]:
full_name[0]='N'

TypeError: 'str' object does not support item assignment

### Lists can be combined with each other

In [24]:
x, y

([1, 3, 4, 5], ['dog', 'cat', 'pig', 'cow'])

In [25]:
x + y

[1, 3, 4, 5, 'dog', 'cat', 'pig', 'cow']

We can do some operations on lists.

Note in practice we won't do it this way. We will read the data into a dataframe and then make a list from that.

In [26]:
temperature = [54, 54, 52, 52, 52, 50, 48, 50, 48, 48, 48, 50, 50, 50, 50, 48, 48, 48, 50, 
               52, 54, 54, 54, 54, 57, 61, 63, 66, 68, 70, 72, 73, 75, 75, 79, 79, 81, 81,
               81, 81, 82, 84, 84, 86, 86, 86, 86, 88, 84, 84, 84, 82, 84, 82, 82, 82, 82,
               81, 79, 79, 73, 70, 70, 70, 70, 68, 66, 64, 63, 63, 63]

In [34]:
# Every fourth element

temperature[0:-1:4]

[54, 52, 48, 50, 48, 54, 57, 68, 75, 81, 82, 86, 84, 84, 82, 73, 70, 63]

In [35]:
# Length of the list

len(temperature)

71

In [36]:
# sum of the list

sum(temperature)

4787

In [37]:
# two ways to get the mean temperature

sum(temperature)/len(temperature), np.mean(temperature)

(67.4225352112676, 67.4225352112676)

In [38]:
# max and min

max(temperature), min(temperature)

(88, 48)

## Sets and Dataframes

We will not do too much with sets. The main use for our class is to find all of the unique entries in a column for a dataframe - i.e. we make use of the property sets have of not allowing duplicates.

Dataframes we have already had an introduction to. There is really too much in Pandas for us to completely cover it so I would send you to check the [Pandas help documentation](https://pandas.pydata.org/docs/).