# Collections

**Prerequisites**

- [Core data types](basics.ipynb)  


**Outcomes**

- Ordered Collections  
  
  - Know what a list is and a tuple is  
  - Know how to tell a list from a tuple  
  - Understand the `range`, `zip` and `enumerate` functions  
  - Be able to use common list methods like `append`, `sort`,  and `reverse`  
  
- Associative Collections  
  
  - Understand what a `dict` is  
  - Know the distinction between a dicts keys and values  
  - Understand when `dict`s are useful  
  - Be familiar with common `dict` methods  
  
- Sets  (optional)  
  
  - Know what a set is  
  - Understand how a set differs from a list and a tuple  
  - Know when to use a set vs a list or a tuple  

## Ordered Collections

### Lists

A Python list is an ordered collection of items

We can create lists using the following syntax

```python3
[item1, item2, ...,  itemN]
```


where the `...` represents any number of additional items

Each item can be of any type

Let’s create some lists

In [None]:
# created, but not assigned to a variable
[1, "hello", 3.0]

In [None]:
# stored as the variable `x`
x = [1, "hello", 3.0]
print("x has type", type(x))
x

#### What can we do with lists?

We can access items in a list called `mylist` using `mylist[N]`
where `N` is an integer

Note: Anytime that we use the syntax `x[i]` we are doing what is
called indexing – it means that we are selecting a particular element
of a *collection* `x`

In [None]:
x[1]

Wait? Why did `x[1]` return `'hello'` when the first element in x is
actually 1?

This happened because Python starts counting at zero!

Lets repeat that one more time for emphasis **Python starts counting at zero**!

To access the first element of x we must use `x[0]`:

In [None]:
x[0]

We can also determine how many items are in a list using the `len` function

In [None]:
len(x)

What happens if we try and index with a number higher than the number of
items in a list?

In [None]:
# uncomment the line below and run
# x[4]

We can check if a list contains an element using the `in` keyword

In [None]:
3.0 in x

In [None]:
"foobar" in x

<blockquote>

**Check for understanding**

Define two lists `y` and `z`

They can contain anything you want

What happens when you do `y + z`? Talk about the answer to that question
with your neighbor

When you have finished that, try `2*x` and `x*2`

What happened? Have the other partner explain


</blockquote>

In [None]:
y = [] # fill me in!
z = [] # fill me in!

<blockquote>

</blockquote>

For our list `x`, other common operations we might want to do are…

In [None]:
x.reverse()
x

In [None]:
x.append(10)
x

In [None]:
number_list = [10, 25, 42, 1.0]
print(number_list)
number_list.sort()
print(number_list)

Note that in order to `sort` we had to have all elements in our list
be numbers (notice though that we mixed `int` and `float`).

We could actually do the same with a list of strings. In this case sort
will put the items in alphabetical order.

In [None]:
str_list = ["NY", "AZ", "TX"]
print(str_list)
str_list.sort()
print(str_list)

When trying to sort, we can’t mix numbers and strings because there is
no unambiguous way to compare them to determine which is smaller

In [None]:
# uncomment the line below and see what happens!
# x.sort()

<blockquote>

**Check for understanding**

Work with your neighbor on this question

In the first cell, try `y.append(z)` — have one partner explain what happened

In the second cell try `y.extend(z)` — have the other partner explain
what happened

HINT: when you are trying to explain use `y.append?` and `y.extend?` to
see a description of what these methods are supposed to do


</blockquote>

In [None]:
y = ["a", "b", "c"]
z = [1, 2, 3]
# YOUR CODE HERE
print(y)

<blockquote>

</blockquote>

In [None]:
y = ["a", "b", "c"]
z = [1, 2, 3]
# YOUR CODE HERE
print(y)

<blockquote>

</blockquote>

### The `range` function

One function you will see often in Python is the `range` function

There are three versions:

- `range(N)`: goes from 0 to N-1  
- `range(a, N)`: goes from a to N-1  
- `range(a, N, d)`: goes from a to N-1, counting by d  


When I call the range function, I get back something that has type `range`:

In [None]:
r = range(5)
print("type(r)", type(r))

To see what is inside the `range` we can call the list function with our
range:

In [None]:
list(r)

<blockquote>

**Check for understanding**

Experiment with the other two versions of the `range` function


</blockquote>

In [None]:
# try list(range(a, N)) -- you pick `a` and `N`

<blockquote>

</blockquote>

In [None]:
# try list(range(a, N, d)) -- you pick `a`, `N`, and `d`

<blockquote>

</blockquote>

### What are tuples?

Tuples are very similar to lists

They also hold ordered collections of items

However, there are two main differences between tuples and lists:

1. tuples are created using parenthesis — `(` and `)` — instead of
  square brackets — `[` and `]`  
1. tuples are *immutable*, which is a fancy computer science word
  meaning that they can’t be changed or altered after they are created  

In [None]:
t = (1, "hello", 3.0)
print("t is a", type(t))
t

We can *convert* as list to a tuple by calling the `tuple` function on
a list

In [None]:
print("x is a", type(x))
print("tuple(x) is a", type(tuple(x)))
tuple(x)

We can also convert a tuple to a list using the list function

In [None]:
list(t)

As with a list, we access items in a tuple `t` using `t[N]` where
`N` is an int

In [None]:
t[0]  # still start counting at 0

In [None]:
t[2]

<blockquote>

**Check for understanding**

Verify that tuples are indeed immutable by attempting the following:

- Changing the first element of `t` to be `100`  
- Appending a new element `"!!"` to the end of `t` (remember with a
  list `x` we would use `x.append("!!")` to do this  
- Sorting `t`  
- Reversing `t`  



</blockquote>

In [None]:
# change first element of t

<blockquote>

</blockquote>

In [None]:
# appending to t

<blockquote>

</blockquote>

In [None]:
# sorting t

<blockquote>

</blockquote>

In [None]:
# reversing t

<blockquote>

</blockquote>

### List vs tuple: which to use?

Should you use a list or tuple?

In general, my rule of thumb is to use a list unless for some reason I *need* to
use a tuple

The cases in which a tuple would be useful are when:

- I want to make sure the *order* of elements can’t change  
- I want to make sure that the actual values of the elements can’t
  change  
- I intend to use the collection as a key in a dict (we will learn what this
  means :doc`soon <dicts>`)  

### Bonus Material: `zip` and `enumerate`

Two functions that can be extremely useful are `zip` and `enumerate`

Both of these functions are best understood by example, so let’s see
them in action and then talk about what they do:

In [None]:
z = zip([1, 2, 3], ("a", "b", "c"))
print("type(z)", type(z))
z

To see what is inside `z`, let’s convert it to a list:

In [None]:
list(z)

Notice that we now have a list, where each item is a tuple

Within each tuple we have one item from each of the collections we
passed to the zip function

In particular, the first item in `z` contains the first item from
`[1,2,3]` and the first item from `("a", "b", "c")`

The second item in `z` contains the second item from each collection
and so on

Now let’s experiment with `enumerate`

In [None]:
e = enumerate(["a", "b", "c"])
print("type(e)", type(e))
e

Again, to see what is inside we call `list(e)`

In [None]:
list(e)

We again have a list of tuples, but this time the first element in each
tuple is the *index* of the second tuple element in the initial
collection

Notice that the third item is `(2, 'c')` because
`["a", "b", "c"][2]` is `'c'`

<blockquote>

**Check for understanding**

**Challenging** For the tuple `foo` below, use a combination of `zip`,
`range`, and `len` to mimic `enumerate(foo)`

Verify that your proposed solution is correct by converting each to a list
and checking equality with `==`

HINT: You can see what the answer should look like by starting with
`list(enumerate(foo))`


</blockquote>

In [None]:
foo = ("good", "luck!")

<blockquote>

</blockquote>

## Associative Collections

### Dictionaries

A dictionary (or dict) associates `key`s with `value`s

It will feel similar to a dictionary for words, where the keys are words and
the values are the associated definitions

The most common way to create a `dict` is to use curly braces — `{`
and `}` — like this:

```python3
{"key1": value1, "key2": value2, ..., "keyN": valueN}
```


where the `...` indicates that we can have any number of additional
terms

The crucial part of the syntax is that each key-value pair is written
`key: value` and that these pairs are separated by commas — `,`

Let’s see an example

In [None]:
mydict = {"a": 1, 2: "b", "a_list": [1, 2, 3], (4, 2): "tuple!"}

Often it is easier to read the code that makes a dict if we put each
`key: value` pair on its own line (recall our earlier comment on
using whitespace effectively to improve readability!)

The code below is equivalent to what we saw above:

In [None]:
mydict = {
    "a": 1,
    2: "b",
    "a_list": [1, 2, 3],
    (4, 2): "tuple!",
}

Notice that the keys and values can have different types

Most often the keys will be strings, but we could also use numbers
(`int`, or `float`) or even tuples

Values can be **any** type

<blockquote>

**Check for understanding**

Create a new dict which associates stock tickers with its stock price.

Here are some tickers and a price

- AAPL: 175.96  
- GOOGL: 1047.43  
- TVIX: 8.38  



</blockquote>

This next example is meant to drive home the fact that values can be
*anything* – Including another dictionary.

In [None]:
companies = {"AAPL": {"bid": 175.96, "ask": 175.98},
             "GE": {"bid": 1047.03, "ask": 1048.40},
             "TVIX": {"bid": 8.38, "ask": 8.40}}

In [None]:
companies

#### Getting, setting, and updating dict items

We can now ask Python to tell us what the value for a particular key is using
the syntax `d[k]`  where `d` is our `dict` and `k` is the key we want to
find the value for

Here are some examples

In [None]:
mydict["a"]

In [None]:
mydict[(4, 2)]

If we ask for the value of a key that is not in the dict, we will get an error

In [None]:
# uncomment the line below to see the error
# mydict["hello"]

We can also add new items to a dict using the syntax `d[new_key] = new_value`:

Let’s see some examples

In [None]:
mydict["Hello"] = "World!"

In [None]:
mydict["Frodo"] = "Baggins"

By typing `mydict` or `print(mydict)` we can have Python display all
key-value pairs (**warning**: all key-value pairs will be displayed, no
matter how many there are!):

In [None]:
mydict

If we try to add a new item for a key that already exists, the value
will be updated

In [None]:
mydict["a"] = 100
mydict

<blockquote>

**Check for understanding**

Create a dict named `family` where there is one `key: value` pair for
each member of your family

The keys should be first names and the values should be tuples of the form
`(age, "Last name")` – if you don’t know the exact age, just guess


</blockquote>

#### Common `dict` functionality

There are a handful of common things we can do with dicts

We will demonstrate them with examples below

In [None]:
# number of key-value pairs in a dict
len(mydict)

In [None]:
# get a list of all the keys
list(mydict.keys())

In [None]:
# get a list of all the values
list(mydict.values())

In [None]:
mydict2 = {"a": "new a value!", "c": "I'm totally new"}

# Add all key-value pairs in mydict2 to mydict.
# if the key already appears in mydict, overwrite the
# value with the value in mydict2
mydict.update(mydict2)
mydict

In [None]:
# Get the value associated with a key, or return a default value
# use this to avoid the NameError we saw above if you have a reasonable
# default value
mydict.get(2, "Default Value")

In [None]:
mydict.get("invalid key", "Default value!")

<blockquote>

**Check for understanding**

Use Jupyter’s help facilities to learn how to use the `pop` method to
remove the key `"c"` (and its value) from the dict.


</blockquote>

<blockquote>

**Check for understanding**

Turn to your neighbor and have one of you explain what happens to the value
you popped

Have the other explain what happens if you try to `pop` that same key
twice


</blockquote>

### Sets

Python has an additional way to represent collections of items: sets

If you are familiar with the mathematical concept of sets, then you will
understand the majority of Python sets already

If you don’t know the math behind sets, don’t worry: we’ll cover the
basics of Python’s sets here

A set is an *unordered* collection of *unique* elements

The syntax for creating a set uses curly bracket `{` and `}`

```python3
{item1, item2, ..., itemN}
```


Here is an example:

In [None]:
s = {1, "hello", 3.0}
print("s has type", type(s))
s

<blockquote>

**Check for understanding**

Try creating a set with repeated elements (e.g. `{1, 2, 1, 2, 1, 2}`)

What happens?

Why?


</blockquote>

As with lists and tuples, we can check if something is `in` the set
and check the sets length:

In [None]:
print("len(s) =", len(s))
"hello" in s

Unlike lists and tuples, we can’t extract elements of a set `s` using
`s[N]` where `N` is a number

```python3
# Uncomment the line below to see what happens
# s[1]
```


This is because sets are not-ordered, so the notion of getting the
second element (`s[1]`) is not well defined

We add elements to a set s using `s.add`

In [None]:
s.add(100)
s

In [None]:
s.add("hello") # nothing happens, why?
s

We can also do set operations on sets

Consider the set `s` from above and the set
`s2 = {"hello", "world"}`

- `s.union(s2)`: returns a set with all elements in either `s` or
  `s2`  
- `s.intersection(s2)`: returns a set with all elements in both `s`
  and `s2`  
- `s.difference(s2)`: returns a set with all elements in `s` that
  aren’t in `s2`  
- `s.symmetric_difference(s2)`: returns a set with all elements in
  only one of `s` and `s2`  


<blockquote>

**Check for understanding**

Test out two of the operations described above using the original set we
created, `s`, and the set created below `s2`


</blockquote>

In [None]:
s2 = {"hello", "world"}

<blockquote>

</blockquote>

In [None]:
# Operation 1

<blockquote>

</blockquote>

In [None]:
# Operation 2

<blockquote>

</blockquote>

As with tuples and lists, there is a `set` function to convert other
collections to sets

In [None]:
x = [1, 2, 3, 1]
set(x)

In [None]:
t = (1, 2, 3, 1)
set(t)

Likewise we can convert sets to lists and tuples

In [None]:
list(s)

In [None]:
tuple(s)

#### List/tuple vs set: which to use?

Should you use a list or tuple versus a set?

I would use a set when:

- I don’t care about the order  
- I want to limit my analysis to unique items  
- I plan check whether `something in set` frequently  


All other cases are usually good fits for either a list or tuple (see above for a comparison between list and tuple)