UM MSBA - BGEN632

# Week 6: Collections

As a budding data scientist, you have learned many of the foundational components of programming, including syntax, operators, conditional statements, and iterative statements. In this module you will learn about one of the last fundamental components of Python: collections.

Collections are a data type that many programming languages implement. The purpose of a collection is to house multiple values in a single variable. In Python, the term *sequence* is preferred when referring to these types of objects, rather than the term collection. However, to maintain consistency with other programming languages, we will refer to them as collections.

Python, like many other programming languages, provides several different types of collections. Some of those include:
* Arrays
* Lists
* Dictionaries (i.e., key-value pairs)

## Arrays

To begin our discussion of collections, we will focus on arrays. While we will not use arrays in this course, having an understanding of arrays is vital to understanding the other types of collections we will use. Arrays are a special type of object in programming. They are containers, or collections, for storing assorted data. The data must be the same type. Arrays behave like a storage bin.

<div><center><img src = "assets/m&m_sorted.jpg" width = "500"></center></div>

In the picture above, several jars contain different colored M&Ms. While the jars contain different colors of the candy, they are the same type of candy. Arrays can store multiple variables (which all have different values), so long as they are the same type.

While arrays may appear similar to variables, they are distinct. Often, arrays tend to be difficult for programming novices to grasp. Some of the confusion stems from conflating the syntax and function of arrays with those of variables.

Recall, declaring a variable follows this simple pattern:

> *`name`* = `value`

You may end up with many different variables:

> *`name_one`* = `value one`

> *`name_two`* = `value two`

> *`name_three`* = `value three`

> *`name_four`* = `value four`

This may look something like this:

```Python
int_a = 3
int_b = 6
int_c = 33
int_d = 96
int_e = 123
```

Tracking these variables may become tedious or cumbersome. A simpler approach is to use an array to store all of these variables into a single variable.

When creating an array, a similar pattern for creating a variable is used:

> *`name`* = array('*`type`*', [`value 1`, `value 2`,...,`value n`])

You can see the similarities between creating a variable and an array:
* Both are given a name.
* Each has a value assigned (arrays have one or more values assigned).

An array requires you to specify the data type *`type`*, which is not very Pythonic. While the syntax appears the same, keep in mind they behave differently (more on that later).

Again, an array is a container (or *collection*) of variables. Returning to the image above, think of the entire jar as an array. Each of the individual jars is an array name. This indicates the use of an array is different from that of a variable. Just like you use the jar of M&Ms to easily transport, sort, and organize pieces, an array provides the means to transport, sort, and organize variables in code. Programs that are thousands of lines long and contain multiple files have lots of variables. Arrays provide an easy solution for organizing and passing groups of variables around.

Arrays and lists are *indexed* (more on that below) while dictionaries (i.e., key-value pairs) are not. In Python, arrays possess the following characteristics:
* Arrays are zero indexed: an array with *n* elements is indexed from 0 to *n*-1.
* Arrays are declared just like variables. Array elements can be of any type, including an `array` type.
  * The number of dimensions are set when an array variable is declared. 
  * The length of each dimension is established when the array instance is created. 

### Array Attributes

#### Indexing

The first property of an array was mentioned above: "Arrays are zero indexed: an array with *n* elements is indexed from 0 to *n*-1." This is foundational to arrays, so we will begin our discussion here.

Have you lived with someone or known someone who is obsessive over organization? That person may own a label maker that allows them to print out labels. They place those sticker labels on plastic bins, drawers, shelves, etc.

<div><center><img src = "assets/m&m_labeled.png"></center></div>

*Labeling* is important for organization. In the image above, the two jars are labeled according to their color. If we want a piece that has some varying yellow tint, we simply pull the bin labeled "yellow." 

Arrays have a similar system to "label" the "bins" contained inside of them. This system uses a simple numerical index starting at zero and ending at *n*-1, where *n* is the number of variables held by the array. So, if your array has 15 "bins" or variables, then the last bin will be labeled 14, or *n*-1 (i.e., 15 - 1 = 14).

The bins in an array are referred to as *elements*. Thus, each element in an array has an assigned "label" or index value. The first element is assigned 0, the second element is assigned 1, with each subsequent element assigned the next index value. *This is important to remember!* 

> The index, or label, always starts at 0 in Python. 

Why does indexing begin at 0 instead of 1?

First, the real reason is related to the programming language C and the expense of memory at the time of its development. Without going down a rabbit hole, it is "cheaper" to index at 0 than 1 (feel free to search online if you want to go deeper). Other programming languages, like C#, followed the pattern merely out of convention. Many early programming languages allowed the programmer to choose the indexing, so 1 was often a choice. 

Second, conceptually, Python is actually two separate things that are packaged together: Python the programming language (i.e., syntax, code) and Python the implementation (i.e., what processes the syntax, the code). The original and default implementation is the programming language C. Sometimes, online, you may see people refer to Python as *CPython*. This is to differentiate it from *JPython*, or Python that is implemented with Java.

> *CPython* is implemented using the C language. *CPython* and *JPython* use the same programming language (i.e., syntax) but then implement them in their own respective ways. Thus, because C language was the original implementation back in the early 90s, Python picked up a lot of its habits and behavior. 

You can think of Python (i.e., CPython) as the reference language. This is the main development platform. When you download Python from [https://www.python.org](https://www.python.org), you are actually downloading CPython. Other flavors exist besides CPython and JPython: PyPy, IronPython, etc. You can read more about those alternative implementations at [https://www.python.org/download/alternatives/](https://www.python.org/download/alternatives/).

Third, `counting != indexing`. Remember, indexing is a means of labeling an element within an array. When we count the number of elements in an array, we are enumerating how many elements exist. *That is not the purpose of indexing.* Again:

> Indexing is a method of labeling the elements in an array, not for counting.

Think of indexing like the mile markers on a highway. When you cross into a new region or state, sometimes you will see a mile marker on the border labeled with 0. The image below shows the 0 mile marker for [U.S. Highway 1 (US 1)](https://en.wikipedia.org/wiki/U.S._Route_1_in_Florida) which begins at the southern end of Florida in Key West.

<div><center><img src = "assets/Florida-Highway-1-Mile-0.jpg"></center></div>

This represents the origin. When starting at the origin, you are at 0, not 1. This is similar to a 2-dimensional graph with a horizontal axis *x*-axis and vertical axis *y*-axis. Both *x* and *y* begin at 0. Returning to the example, as you move away from mile marker 0, you are headed toward mile marker 1. You have not yet traversed any distance, so mile marker 0 is your origin. Once you have traveled 1 mile, then you reach mile marker 1. Yet, as you travel from 0 to 1, this is often referred to as the first mile.

With an array, the first index value is 0 because it is the origin, the beginning of the array. As you move to the next element, you traverse 1 unit of the array, reaching index value 1. The index is referred to as the *ordinal* position.

Another way to think of this is like time on a clock. At the beginning of a movie, say 33 seconds into the movie, if I asked you the question, "How far into the movie are we?" what would you answer? Would you round up to 1 minute? If you look at a digital clock on a computer, it would show `00h:00m:33s`. In sports, the time between 0 seconds and 59 seconds is referred to as the first minute of play. The time between 60 seconds and 119 seconds, is referred to as the second minute of play. 

The minute marker between 0 and 59 seconds is not quite 1 minute. Yet, it is still referred to as the first minute of play. This is because one refers to indexing and the other refers to counting. Arrays in Python behave the same way. *Indexing starts at 0 while counting starts at 1.*

#### Declaration and Syntax

The second property of an array is as follows:
* Arrays are declared just like variables. Array elements can be of any type, including an `array` type.
  * The number of dimensions are set when an array variable is declared.
  * The length of each dimension is established when the array instance is created. 

As mentioned previously, initializing (i.e., declaring) an array follows a similar pattern to that of declaring a variable:

> *`name`* = array('*`type`*', [`value1`, `value2`,...,`valueN`])

For example, if we want to create an array of numbers, I would do it like so:

In [1]:
import array as arr

num_array = arr.array('i', [3, 6, 99, 300])

We import the library `array` and use its method `array()` to create the array object. 

> It is best practice to import all libraries used in a notebook at the very start of the notebook.

We then specify the data type using the code `i` which indicates a signed integer (do not worry about signed vs unsigned). If you want a table with all the possible data types, please [see this webpage](https://docs.python.org/3/library/array.html).

To access an element in an array, we need to specify the name of the array and designate the index value. For example, if we want to obtain the value of `6` from the array above, we do it like so:

In [2]:
num_array[1]

6

We use the number `1` as the index value because it is 1 index over from the origin.

We can also use this notation to assign a new value. Let's say we want to change the value `6` to `36`. We can perform that like so:

In [3]:
num_array[1] = 36

Let's confirm the change to our array by inspecting it:

In [4]:
num_array

array('i', [3, 36, 99, 300])

#### Other Properties of Arrays

Arrays inherently have many properties we can obtain information about. Additionally, we can manipulate arrays to some extent. One of the most important properties is the length, or count, of your array. To obtain the number of elements in an array, simply use `len()` like so:


In [5]:
len(num_array)

4

This returns the number or count of elements in an array. Remember, the indexing and count are not the same. Thus, this returns the value of `4`, not `3`. We will return to this shortly, but `len()` will be important when we discuss iterating over collections with loops.

## Lists

Thus far you have learned about the collection arrays. As mentioned at the start of this module, Python provides various types of collections:
* Arrays
* Lists
* Dictionaries (i.e., key-value pairs)

In this part of the tutorial, we will cover lists. Specifically:

1. The purpose of lists
1. Initializing lists
1. Common list methods and properties
1. Indexing
1. Iterating over lists

### Purpose: List vs. Array

For all intents and purposes, `lists` behave and operate like `arrays`. The most important distinction is a `list` can grow and shrink dynamically. Arrays are strongly typed. All elements in an array must be the same data type. At the creation of an array, you must declare its data type. If you create an array with 12 elements, all 12 must be the same data type.

As an alternative, we can rely on `lists`. With a list object, we do not need to declare a data type. Lists are more Pythonic. Lists are an original data type of Python while arrays are not.

### Initializing Lists

The syntax for initializing a list is comparable to an array. The basic syntax is as follows:

> *`name`* = [`value 1`, `value 2`,...,`value N`]

Compare this to initializing an array:

> *`name`* = array('*`type`*', [`value 1`, `value 2`,...,`value N`])

Let's use mountain ranges as an example. We will create a list containing a couple of the mountain ranges that surround Missoula:

In [6]:
missoula_mountains = ["Bitterroot", "Sapphire"]

missoula_mountains

['Bitterroot', 'Sapphire']

### Methods and Properties

Next, let's discuss a few methods and properties of lists. This is where they really shine and you will see how they truly differ from arrays. Specifically, we will discuss the following:

* `append()`
* `insert()`
* `remove()`
* `pop()`

#### Append

The method `append()` appends a new value to the end of a list. For example, if we want to add `Garnet` to the end of `missoula_mountains` we would use `missoula_mountains.append("Garnet")`. The code below illustrates the difference between the original list and the updated one:

In [7]:
# The original list
missoula_mountains = ["Bitterroot", "Sapphire"]

missoula_mountains

['Bitterroot', 'Sapphire']

In [8]:
# The updated list: Garnet added
missoula_mountains.append("Garnet")

missoula_mountains

['Bitterroot', 'Sapphire', 'Garnet']

#### Insert

If you would like to add a new value at a specific location in the list, then you would need to use `insert()` instead of `append()`. We will insert a new value at index location `1`:

In [9]:
# The updated list: Rattlesnake added at index location 1
missoula_mountains.insert(1, "Rattlesnake")

missoula_mountains

['Bitterroot', 'Rattlesnake', 'Sapphire', 'Garnet']

The code we used `missoula_mountains.insert(1, "Rattlesnake")` requires us to specify the index value for the insertion point. In this example, we wanted it listed as the second item; therefore, we used index value `1`.

#### Remove

We can also remove elements from a list. This requires us to use the method `remove()`. Let's say we made a mistake and would like to remove `Sapphire` from `missoula_mountains`. We simply use the following approach:

In [10]:
missoula_mountains.remove("Sapphire")

missoula_mountains

['Bitterroot', 'Rattlesnake', 'Garnet']

What if we have duplicate items? Which of the two will Python remove? Let's experiment and find out.

In [11]:
missoula_mountains.insert(1, "Sapphire")
missoula_mountains.insert(3, "Sapphire")

missoula_mountains

['Bitterroot', 'Sapphire', 'Rattlesnake', 'Sapphire', 'Garnet']

In [12]:
missoula_mountains.remove('Sapphire')
missoula_mountains

['Bitterroot', 'Rattlesnake', 'Sapphire', 'Garnet']

Apparently, the first entry that matches is removed from the list. What if we want to remove the second entry? Or, what if we do not know the value of the element, but we know the index value of the element? We can use `pop()` instead. 

#### Pop

The method `remove()` deletes the first entry for a given value. If we would like to remove an element at a given index, we can use `pop()`. For example, let's say we would now like to remove `Sapphire` at index `2`. We would use the code like so:

In [13]:
missoula_mountains.pop(2)

'Sapphire'

The output of `pop()` provides confirmation of the element that is deleted by outputting the value `Sapphire`. 

Recall that the default behavior of Jupyter notebooks is to show explicit output always and implicit output *if* it's at the end of the code cell. The examples below highlight explicit vs. implicit output

To delete an element at a specific index and display the updated list:

```Python
missoula_mountains.pop(2)  # implicit output - will NOT be outputted
missoula_mountains  # implicit output at the end of the code cell - will be outputted
```

To delete an element at a specific index and display the deleted element AND updated list:

```Python
print(missoula_mountains.pop(2))  # print() makes this explicit output, can also use display() instead
missoula_mountains # implicit output at the end of the code cell - will be outputted
```

Alternatively, you can change the default behavior of your notebook to always display the full output by [changing node interactivity settings which we covered in Week 3 on Canvas](https://canvas.umt.edu/courses/18274/pages/jupyter-notebook-behavior?module_item_id=1261737). 

### Indexing

Indexing for a list operates just like it does for an array. It is a 0-index, so all indexing begins at 0, *not 1*. The index values increment one number at a time. Since lists are dynamic, when you remove, add, insert, or otherwise change the list, the index of each element is updated accordingly. The list will never have a gap or missing index value.

For example, let's say we insert a new element into our list at index `1`. If we output the value of the element at index `1` before and after the insertion, we receive different values:


In [14]:
mexican_food = ["enchiladas", "flautas", "pozole"]

mexican_food[1]

'flautas'

In [15]:
mexican_food.insert(1, "elote")

mexican_food[1]

'elote'

This is also true if we use `pop()`. The index values will arrange themselves such that no gaps are present. This is a nice feature because other programming languages will not update the index for you.

Next, we will cover looping over lists using `for()`.

## Iterating over collections

In this section, we will learn about another iterator statement, the `for` loop. This differs from the `while` loop. The `for` loop is intended to work with collections such as `arrays`, `lists`, `dictionaries`, `sets`, and `tuples`. While other programming languages are indifferent towards how `while` and `for` are implemented, Python is more strict. We will discuss the differences in depth later in this section.

### `for` statement

Let's start off by reviewing some important concepts of `while` loops. When implementing a `while` loop, you will implement the following three components:

* `initializer`: executed only one time before beginning the loop; e.g., `i = 0`.
* `condition`: determines if the next iteration of the loop should be executed; must always be a boolean expression; e.g., `i < 2001`.
* `iterator`: determines what happens after the successful completion of each loop; e.g., `i+=1` or `i = i + 1`. 

As mentioned, a `for` statement iterates over a collection. Unlike the `while` loop, *it does not evaluate a boolean expression*. Rather, you iterate sequentially over each element in the list. Thus, in Python, the `for` loop does not have an initializer, a conditional expression, or an iterator.

In a previous example, we created a list of Mexican food. We can use `for` to iterate through each element like so:

In [18]:
mexican_food = ["enchiladas", "elote", "flautas", "pozole", "tortillas"]

for j in mexican_food:
    print(j)

enchiladas
elote
flautas
pozole
tortillas


Observe, the variable `j` takes on the value of each element in the list. That is, each time Python sequentially moves from one element to the next, the variable `j` is assigned a new value. 

Here is another way to think about this. We can perform the same process without using a `for` loop with the code below. The output is the same as when we implemented the `for` loop:

In [19]:
j = mexican_food[0]
print(j)
j = mexican_food[1]
print(j)
j = mexican_food[2]
print(j)
j = mexican_food[3]
print(j)
j = mexican_food[4]
print(j)

enchiladas
elote
flautas
pozole
tortillas


Inside the loop, the variable `j` can be used just like any other variable. Again, its value will change each time the loop progresses to the next element.

To further reinforce this idea, we will adjust the code slightly:

In [20]:
for j in mexican_food:
    print(f"The variable j is assigned the value {j}")

The variable j is assigned the value enchiladas
The variable j is assigned the value elote
The variable j is assigned the value flautas
The variable j is assigned the value pozole
The variable j is assigned the value tortillas


### `while` loop with a list?

Sometimes you may need to rely on a `while` loop *and* iterate over a list. This is accomplished like in other programming languages. You will rely on the initializer to represent the index value of your list.

We will use the same list `mexican_food` and rely on a `while` loop to iterate over its contents.

In [21]:
k = 0

while k < len(mexican_food):
    print(mexican_food[k])
    k += 1

enchiladas
elote
flautas
pozole
tortillas


While the `for` loop is preferred for sequences such as list, you may have need of `while` as an iterator.

### Jump Statements

In the previous tutorial I discussed jump statements such as `break` and `continue`. These jump statements apply to `for` loops as well. 

Here is an example from  the previous tutorial:

In [22]:
i = 0
j = 20

while i < j:
    print(f"i is {i}, j is {j}")
    if i == 3 and j == 9:
        break
    i += 1
    j -= 1
    
print("All done!")

i is 0, j is 20
i is 1, j is 19
i is 2, j is 18
i is 3, j is 17
i is 4, j is 16
i is 5, j is 15
i is 6, j is 14
i is 7, j is 13
i is 8, j is 12
i is 9, j is 11
All done!


If we want to implement `break` in a `for` loop, what might that look like? In this example, we will iterate over the list `mexican_food`. When our loop is iterating over the element with the value `flautas`, we want to stop the loop from progressing and end.

In [23]:
for j in mexican_food:
    print(f'The variable j is assigned the value {j}.')
    if j == "flautas":
        print("\nThat's enough food for now!")
        break

The variable j is assigned the value enchiladas.
The variable j is assigned the value elote.
The variable j is assigned the value flautas.

That's enough food for now!


## Dictionary: Key-Value Pairs

You are already familiar with the collections `arrays` and `lists`. One of the important concepts related to these objects is they rely on an index to store, sort, and order the values they contain. You can think of an index like a primary key. In a database system, a table contains rows of data. Each row is differentiated from another by its primary key. This concept, using an `index`, presents a couple of problems:

1. You are limited to the value of the index itself, which means the index is not a flexible key.
2. The index starts at 0 and does not allow for other starting values.

A `dictionary`, or `key-value pair` as it is sometimes referred to in programming, provides many of the same properties as `arrays` and `lists`. The key is a unique identifier while the value is some element you wish to store. Think of this like an array, except instead of using numbers as the index you define the index however you wish. The `key` is the index and the `value` is the element's value. 

This is where it differs as a collection: it provides a link between a `key` and a `value`. You define the `key`. If you did not want to use a numerical `key`, you could use a `string`. For example, many organizations use social security number as a primary key. It is not a true number because 1) you cannot perform mathematical operations on it and 2) it contains hyphens `-`, so it is really just a string.

For example, say we have an array containing ages of students.

```python
student_age = [36, 22, 32, 64, 25, 56, 42, 33]
```

If we want to retrieve the value for the 3rd employee, I would simply use the index to obtain the value like so:

```python
student_age[2]
```

A downside to this, though, is we have to know the index value for the student. If we have an array with over 2,000 values, this would become cumbersome. How do we know which index value belongs to a student of interest? We do not! A dictionary is an excellent alternative to this issue.

### Basic Syntax

Creating a dictionary is fairly straight forward. While lists use square brackets, a dictionary utilizes curly brackets. In this example, we will create a dictionary housing the first name and age of students.

In [24]:
student_age = {
    "Theo": 36,
    "Josie": 22,
    "Lili": 32
}

student_age

{'Theo': 36, 'Josie': 22, 'Lili': 32}

In this example, the student's first name is the key. The value is the age. 

We can also use the dictionary constructor method `dict()` to create a new dictionary like so:

In [25]:
student_age = dict([('Theo', 36), ('Josie', 22), ('Lili', 32)])

We can also use the constructor in this manner when the key is simple:

In [26]:
student_age = dict(Theo = 36, Josie = 22, Lili = 32)

### Adding and Updating Elements

Now that we have created a dictionary of student ages, we would like to add additional employees to it. The syntax is simple:

> *dictionary_name*[*key*] = *value*

We specify the name of my dictionary object, give it the new `key`, and assign its `value` using `=`. Here is an example:

In [27]:
student_age['Danny'] = 43

An alternative method uses the `update()` function for the dictionary object. The syntax looks like this:

> *dictionaryName*.update()

The value that goes inside the function `update()` varies, depending on how you want to use it. For example, We can create a new dictionary and update the original with the new one:

In [28]:
new_ages = {
    'Christy': 21, 
    'King': 21
}

student_age.update(new_ages)
student_age

{'Theo': 36, 'Josie': 22, 'Lili': 32, 'Danny': 43, 'Christy': 21, 'King': 21}

Or, we can perform the entire operation on a single line:

In [29]:
student_age.update([('Christy', 21), ('King', 21)])

Just like arrays and lists, the values contained in elements can be duplicated.

In [30]:
employee = {
    "233-83-9073":"Harmony Cobel",
    "839-29-1893":"Mark Scout",
    "118-28-8462":"Ricken Hale",
    "544-24-7631":"Mark Scout"
}

employee

{'233-83-9073': 'Harmony Cobel',
 '839-29-1893': 'Mark Scout',
 '118-28-8462': 'Ricken Hale',
 '544-24-7631': 'Mark Scout'}

In this dictionary, we have four employees. We have added two employees with the same name, Mark Scout. 

What happens when I try to change the key for Harmony Cobel to be equal to Ricken's?

In [31]:
employee = {
    "118-28-8462":"Harmony Cobel",
    "839-29-1893":"Mark Scout",
    "118-28-8462":"Ricken Hale",
    "544-24-7631":"Mark Scout"
}

employee

{'118-28-8462': 'Ricken Hale',
 '839-29-1893': 'Mark Scout',
 '544-24-7631': 'Mark Scout'}

Harmony's record is not in the output. Duplicate keys are *not allowed*. If we attempt to add an entry with the exact same key as an already-existing element, Python will assign the last value. 

### Retrieving Elements

Once we have elements stored inside the dictionary, how do we retrieve the values? The process is similar to arrays and lists, except instead of a numerical indexer we simply use the `key`. 

Let's return to a prior example for dictionary `student_age`. This contains the following elements:

```
{'Theo': 36, 'Josie': 22, 'Lili': 32, 'Danny': 43, 'Christy': 21, 'King': 21}

```

We want the age for Josie. To access this, we simply use her first name as the index value:

In [32]:
f"Josie is {student_age['Josie']}-years old."

'Josie is 22-years old.'

Thus, we can reference a single element by using the key like the indexer in an `array` or `list`. What if we wanted to try and use an index value like we did for `array` or `list`?

In [33]:
print(f"{student_age[1]}")

KeyError: 1

The error states that the key we specified, `1`, does not exist in the sequence `student_age`. 

### Existing Key

You may need to determine if a key exists in your dictionary. For example, if we attempt to determine if the student Danny exists in our dictionary, how might we do so? Python provides a few different options. For the first option, we can use the `in` operator in conjunction with an if statement.

In [34]:
if 'Danny' in student_age:
    print("Dictionary student_age contains Danny.")
else:
    print("Dictionary student_age does not contain Danny.")

Dictionary student_age contains Danny.


The second option relies on the `get()` method. This method returns the value of the key if present or `None` if the key does not exist. In this next example, we want to first determine if a key exists before we add a new entry.

In [35]:
if student_age.get('Samir') == None:
    student_age['Samir'] = 53
    print('Student not found. Adding new student.')
else:
    print('That student already exists.')

Student not found. Adding new student.


If we run the code a second time, we receive the following output: `That employee already exists.`

In [44]:
if student_age.get('Samir') == None:
    student_age['Samir'] = 53
    print('Student not found. Adding new student.')
else:
    print('That student already exists.')

Student not found. Adding new student.


As you can see, the program skipped adding the entry because the key already exists. When using `dictionaries` it is always a good practice to assess if a key already exists before adding a new key-value pair.

A third option uses a round-about way to determine the existence of a key. The `keys()` method obtains a list of all keys in a dictionary. You can simply use to output all the existing keys (with or without the `print` function in Jupyter notebooks).

In [37]:
student_age.keys()

dict_keys(['Theo', 'Josie', 'Lili', 'Danny', 'Christy', 'King', 'Samir'])

#### Removing Key-Value Pair

Similar to `lists`, we can add and remove elements in a `dictionary`. You have already seen how to add a new element. To remove one, we have two options. The first option relies on using `del` while the second uses `pop()`. As an example of using `del`, if we want to remove the student we just added, Samir, we would do so like this:

In [43]:
del student_age['Samir']

The downside to this method is we have no way of knowing whether or not the key `Samir` exists. If the key does not exist, Python provides this error:

```
KeyError                                  Traceback (most recent call last)
  Cell In[124], line 1
----> 1 del student_age['Samir']

KeyError: 'Samir'
```

We can use the method `get()` that we just learned about to first check the existence of the key and then subsequently remove it.

In [40]:
if student_age.get('Samir') != None:
    del student_age['Samir']

This method requires more than one line of code, but it is safer and avoids errors. If you want a one-line solution, then you should use `pop()`. You already learned about this method for removing elements from a list. You can also use it for dictionaries.

In [45]:
student_age.pop('Samir', None)

53

Why specify `None` in the method? This is because we are providing the method two possible options to return: The value belonging to the key or `None` representing nothing. So, if the key does exist, `pop()` removes it. If it does not exist, nothing is returned.

### Iterating over Dictionaries

Like an array or list, we can iterate over the elements contained within a dictionary. Just like the other collections, we can loop over our `dictionary` and output the contents by relying on the `for` loop.

As an example, we will iterate over the dictionary `student_age`.

In [46]:
for s in student_age:
    print(s)

Theo
Josie
Lili
Danny
Christy
King


We chose the letter `s` because my dictionary starts with the letter "s."


Unlike `arrays` and `lists`, when looping over a `dictionary` as seen above, the output in our loop contains just the `key`, not the `value`. For other programming languages, the output includes both the `key` and `value`. This is just syntax. If we want the values, then we simply use the key to access the value:

In [47]:
for s in student_age:
    print(student_age[s])

36
22
32
43
21
21


What if we want to output a combination of both the `key` and the `value`? That's simple too.

In [48]:
for s in student_age:
    print(f'{s} is {student_age[s]}-years old.')

Theo is 36-years old.
Josie is 22-years old.
Lili is 32-years old.
Danny is 43-years old.
Christy is 21-years old.
King is 21-years old.


## Arrays, Lists, or Dictionaries?

At this point, you may be wondering why you would bother using a dictionary when a array or list could easily contain the data. For example, in the examples we have already created objects that have a unique identifier and attributes such as name. We can store our objects in a list, adding and removing objects at will. What advantages does a dictionary have compared to any of these other objects?

Let's compare dictionaries to arrays and lists. First, a comparison against arrays. The advantage a dictionary has over an array is that the dictionary is dynamic. That is, we can add and remove elements just like we can for a list. Remember, an array is not dynamic. Also, the objects in an array must all be of the same type. In a dictionary, the type is elastic and not constrained.

In [49]:
new_dict = {56:21, 876:21}

student_age.update(new_dict)

for s in student_age:
    print(f'Key:{s}\tValue:{student_age[s]}')

Key:Theo	Value:36
Key:Josie	Value:22
Key:Lili	Value:32
Key:Danny	Value:43
Key:Christy	Value:21
Key:King	Value:21
Key:56	Value:21
Key:876	Value:21


Second, let's compare dictionaries to lists. Lists are dynamic like dictionaries and allow multiple data types. Unlike lists, for a dictionary we can define our own index. The list, on the other hand, is limited to an integer as the indexer. This does not mean relying on a numeric integer is a poor choice. But, if you want a non-contiguous, non-numeric index, dictionary is the choice.

Ultimately, in programming, the idea of *parsimony* is important. Keep it as simple as possible. Adding complexity for the sake of complexity is not necessary. If you do not need a non-numeric index, then a list will do.

That concludes this tutorial covering collections (i.e., sequences) in Python. Congrats on reaching the end!