# Python Data Types and Methods: Tuples, Dictionaries and Arrays
********

## Past Projects - Continued

Here are two more projects to browse as examples:

[Jessica](https://www.ocf.berkeley.edu/~jcamacho/2017/12/06/access-to-transit-and-bicycle-infrastructure-in-six-latin-american-cities/)

[Alex](https://www.ocf.berkeley.edu/~garbier/2017/12/08/62/)

Continuing from numeric types, strings and lists, we now cover three more useful data types in Python: tuples, dictionaries, and arrays.  We will cover how to create them, what they are used for, and how to use some of their methods.

## Tuples

Tuples are like lists, but are **immutable**.  The syntax is similar except tuples use parentheses instead of square brackets.

In [None]:
d = ('a', 'b', 'c')
print(d)

In [None]:
d[2] = 'z'

See?  It really is immutable.  You'll just get a traceback if you try.  Use immutables only when you don't want to allow them to be modified.

In [None]:
d.remove['c']

If you want to remove an element or update it, you could translate the tuple back to a list first.

In [None]:
print(d)
e = list(d)
e.remove('c')
print(e)

But notice that e is a list, not a tuple.  If we want the result to be a tuple, we have to convert it back from a list.

In [None]:
f = tuple(e)
print(f)

The zip function takes two equal-length collections (like lists) and combines them element by element to create tuples of the pairs with the same index value. Here we create two lists of integers and zip them to create a list of tuples:

In [None]:
x = [1,2,3]
y = [4,5,6]
zipped = zip(x,y)
print(list(zipped))

## Dictionaries

A Dictionary (or "dict") is a way to store data just like a list, but instead of using only numbers to get the data, you can use almost anything. This lets you treat a dict like it's a database for storing and organizing data.

A python dictionary is a collection of key, value pairs. The **key** is a way to name the data, and the **value** is the data itself. 

Dictionaries are a very handy data type that can be used to manage data you need to look up by a key.  Dictionaries are unordered key - value pairs, separated by a colon.  They are much more general than the word : definition kind of pairing, since the value can be many different kinds of objects.  The syntax in this case identifies a dictionary with curly braces, containing lists of key-value pairs. 

### Creating Dictionaries


There are a few different ways to create dictionaries.  The first two create an empty dictionary.

In [None]:
newdict = {}

In [None]:
newdict=dict()

Another way to create a dictionary is to provide key: value pairs in a list, and put these into curly brackets:

In [None]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}
print(antonyms)

We can then add items to a dictionary using update, or assigning a value to a new key:

In [None]:
newdict.update({'new': 'item'})

In [None]:
newdict["next"] = "thing"

In [None]:
newdict

Another way to do create a dictionary is by converting lists.  This is a convenient thing to do with real data that comes from files, compared to the simple data we are using here.  The zip function is a bit advanced -- we will come back to it later when we talk about loops and iterables.  For now, just understand that it creates an iterable (think list) of tuples, containing the paired entries from the Keys and Values lists.

Notice that we can use the zip function to combine the keys and values to make the dictionary, making tuples of key-value pairs:

In [None]:
Keys = ['hot', 'fast', 'good']
Values = ['cold', 'slow', 'bad']
antonyms2 = dict(zip(Keys,Values))
print(antonyms2)

### Working with Dictionaries
As usual, find the functions available for this class by using its name, dot, and tab:

In [None]:
dict.

We can retrieve the value of any dictionary entry by its key:

In [None]:
antonyms['hot']

In [None]:
antonyms.get('hot')

We can get the length, keys, and values of a dictionary:

In [None]:
len(antonyms)

To see all the keys in a dictionary, use the keys function:

In [None]:
print(antonyms.keys())

The same thing works to get the values:

In [None]:
print(antonyms.values())

### Dictionaries are mutable

We already saw that we can add elements to a dictionary. We can change the value associated with a particular key by just assigning a value:

In [None]:
antonyms['fast'] = 'gorge'
antonyms

As you can see, working with dictionaries is kind of like working with
lists and tuples, except that you can’t join dicts with the plus operator
(+). If you try to do that, you’ll get an error message:

In [None]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}

synonyms = {'hot': 'very warm', 'fast': 'quick', 'good': 'fine'}

antonyms+synonyms

OK, merging antonyms and synonyms into a single dictionary doesn't make a lot of sense, but we're just learning how to use dictionaries... Here is one way to merge the list.  But notice the result has only three elements, not six. Why?

In [None]:
newdict = {}
newdict.update(antonyms)
newdict.update(synonyms)
newdict

Maybe the result is different if we ensure the keys are unique?

In [None]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}
antonyms2 = {'blue': 'cold', 'red': 'hot'}
newdict = {}
newdict.update(antonyms)
newdict.update(antonyms2)
newdict

If you want to delete a dictionary entry, use del:

In [None]:
del newdict['red']
newdict

What happens if you try to rerun the cell above after you have already run it?

In [None]:
cityPlanners_dict = {"name": "Jane Jacobs", \
                     "year of birth": 1916, \
                     "year of death": 2006, \
                     "place of birth": "Pennsylvania"}

- The keys have to be **unique** and are **immutable**. The usual suspects are strings and integers.
- The values can be anything, including lists, and even other dictionaries (nested dictionaries):

In [None]:
cityPlanners_dict = {"name": "Jane Jacobs", \
                     "year of birth": 1916, \
                     "year of death": 2006, \
                     "place of birth": "Pennsylvania", \
                     "books": ["The Death and Life of Great American Cities",\
                               "Cities and the Wealth of Nations","Dark Age Ahead",\
                               "Eyes on the Street: The Life of Jane Jacobs",\
                               "The Economy of Cities"]}


- key/value pairs are **unordered**. Even though they print in a particular way, this doesn't mean that one comes before the other.

In [None]:
print(cityPlanners_dict)

### Use dictionary keys to access the values

- Instead of using indices to extract items, dictionaries uses key-value pairs to find and retrieve information.

In [None]:
print(cityPlanners_dict.keys(),'\n')
print(cityPlanners_dict.values())

- If you wanted the value of a particular key:

In [None]:
cityPlanners_dict["name"]

- Or perhaps you wanted the last element of the `books` list

In [None]:
cityPlanners_dict["books"][-1]

Now is a good time to take attendance and see how things are going: go to [bitly.com/cp255](https://bitly.com/cp255) and answer the question there. (This is for attendance only and to wake you up!)

### Dictionaries compared to lists

In general, if you need data to be ordered or you have only simple data not needing to be subset, use a list.

If the data is complex or hierarchical, the dictionary's `key` / `value` structure can be very helpful. If you are only concerned about membership in a collection, dictionaries will always be much faster to reference, as the computer doesn't have to keep track of order. And to make a hierarchical or nested data structure, you can put a list (or even another dictionary!) inside a dictionary as the `value`.

Note: when you begin looking at data embedded in websites, it is generally going to be in JSON format, which is comprised of, guess what? Nested Dictionaries!

### Once a dictionary has been created, you can change the values of the data. 

This is because its a *mutable* object.

In [None]:
cityPlanners_dict["place of birth"] = "San Francisco"
print(cityPlanners_dict)

### You can also add new keys to the dictionary.  

- Note that dictionaries are "indexed" with square braces, just like lists--they look the same, even though they're very different.

In [None]:
cityPlanners_dict["gender"] = "Female"
print(cityPlanners_dict)

### You can loop through dictionaries

- There are several ways to loop through dictionaries. Looping over `.keys()` using a 'for' loop is an easy method.
- Note the order is not sorted by key.

In [None]:
race = {'white': 0.643, 'african_american': 0.068, 'asian': 0.21, 'other': 0.079}

for key in race.keys():
    print(key, race[key])

Using a for loop makes it really easy to change the value of items in the dictionary, like transforming fractions to percentages:

In [None]:
# translate fractions to percentages 
race = {'white': 0.643, 'african_american': 0.068, 'asian': 0.21, 'other': 0.079}
for value in race.keys():
    race[value] = round(100 * race[value],2)

print(race)

To see if something is in a collection like a list or a dictionary, use the `in` operator:

In [None]:
countries = ["Afghanistan", "Canada", "Denmark", "Japan"]
race = {'white': 0.643, 'african_american': 0.068, 'asian': 0.21, 'other': 0.079}

print('Japan' in countries)
print('Iran'in countries)
print('asian' in race)
print('asian' not in race)

*****

### Dictionary Summary

1. A python dictionary is a collection of key, value pairs.
2. Use dictionary keys to access the values.
3. Once a dictionary has been created, you can change the values of the data and assign new keys.
4. You can loop through key/value pairs in a dictionary.

## Arrays

A datatype that is incredibly valuable for doing numeric processing on is the Array.  It is provided by the Numpy library so we have to import Numpy in order to use it and its many methods.  We will compare it to lists of numbers to get some insight into why it is useful.  But in short, it provides a way to vectorize your calculations instead of iterating over a list and doing the computations element by element.  When datasets are large, the computational efficiency from using vectorized calculations over for loops are very significant.  But in addition to speed, it also provides a lot of numerical methods that make complex math and linear algebra and other scientific computing used in data science, so much easier.

In [None]:
import numpy as np

Let's start by creating a list, and then creating an array from that list.  Then let's compare how the list of integers works compared to the array.

In [None]:
x = list(range(1,6))
y = np.array(x)

In [None]:
print(x)

In [None]:
print(y)

In [None]:
type(x)

In [None]:
type(y)

Let's see how we can do math operations on these two versions of our data.

In [None]:
sum(x)

In [None]:
sum(y)

In [None]:
min(x)

In [None]:
min(y)

So far so good -- not easy to tell the difference between lists and arrays... but some methods are not available for lists that apply to arrays.

In [None]:
mean(x)

In some cases, we can use a Numpy method and apply it to a list of numbers like we have in this case:

In [None]:
np.mean(x)

In [None]:
np.mean(y)

In [None]:
np.median(x)

In [None]:
np.median(y)

In [None]:
np.size(x)

In [None]:
np.size(y)

In [None]:
x / 10

In [None]:
y / 10

Doing this operation on the list would require iterating over its values and doing the operation element by element:

In [None]:
xscaled = [ z/10 for z in x] 
xscaled

We can create arrays and initialize them with zeros or ones

In [None]:
Z = np.zeros(10)
print(Z)

And we can set values in the arrays by index value -- meaning they are mutable.

In [None]:
Z[4] = 1
print(Z)

In [None]:
Z = np.arange(9).reshape(3,3)
print(Z)

Of course if you have a 2-dimensional array, your indexing into the array becomes two dimensional as well, with row, then column index values:

In [None]:
Z[0,2] = 9
Z

In [None]:
Z.shape

In [None]:
Z.size

You might or might not have noticed that when we do calculations on arrays like adding two arrays together, the default behavior is element by element.  Look at thr result of adding Z to itself:

In [None]:
Z+Z

Or multiplying it by itself:

In [None]:
Z*Z

In this example we create a 10 x 10 array of random numbers and find the min and max of the array:

In [None]:
Z = np.random.random((10,10))
Zmin, Zmax = Z.min(), Z.max()
print(Zmin, Zmax)

Some array methods are available only if you call them as Numpy methods, like median and percentile

In [None]:
Z.mean()

In [None]:
Z.median()

In [None]:
np.median(Z)

In [None]:
Z.percentile(.75)

In [None]:
np.percentile(Z,.75)

Remember this example from the first class?  It was using Numpy arrays and the Matplotlib library for plotting.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
x=range(100)
y=np.sin(x)
plt.plot(x*y)

#### Numpy for Linear Algebra

If you have used a program like Matlab, R, Gauss, Octave or any other matrix - based language for doing linear algebra or statistics, this is not what you expect.  Instead, multiplying two matrices would be expected to produce a dot product, or matrix multiplication.  Numpy can do that too, but it just uses a different syntax:

In [None]:
Z = np.arange(9).reshape(3,3)
print(Z)

In [None]:
Z.dot(Z)

You can also do a matrix transpose (switching axes):

In [None]:
np.transpose(Z)

And easily compute an identity matrix:

In [None]:
np.eye(3,3)

### Numpy Summary

Numpy is a very powerful multi-dimensional array processing library for Python, and it is very fast because the underlying implementation is actually in the C programming language.

The Scientific Python ecosystem we will be using in this course uses Numpy heavily, but usually it is 'under the hood', and we use it through the Pandas library which makes it much easier to use and to handle data as tables.  But you might find significant value in learning more about Numpy if you need lower level functionality or want to code something very computationally intensive.  It is not expected that you use it heavily in this course, however.