<a href="https://colab.research.google.com/github/ucheblessed/ucheblessed/blob/main/02_Data_Structures_in_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Structures**
<p>Welcome to this lesson on data structures! In the previous lesson, you learned about many of the basic building blocks, including data types, for programming in Python. In this lesson, you will learn to make use of new data structures, which group and order data in different ways, to help you solve problems.</p>

<img class="image--26lOQ" src="https://video.udacity-data.com/topher/2021/May/609ecc16_intro-to-python-overarching-diagram-2/intro-to-python-overarching-diagram-2.jpg" alt="The second main topic, data structures, is called out in the overarching diagram of the course." width="960px">

### Specifically, you'll learn about:

* Types of Data Structures: Lists, Tuples, Sets, Dictionaries, Compound **Data Structures
* Operators: Membership, Identity
* Built-In Functions and Methods

## **What Are Data Structures?**
**Data structures** are containers or collections of data that organize and group data types together in different ways. You can think of data structures as file folders that have organized files of data inside them.

<img class="image--26lOQ" src="https://video.udacity-data.com/topher/2021/May/60a2d23c_noun-folder-3674139-1/noun-folder-3674139-1.png" alt="File folder" width="128px">

## Why Do We Need Data Structures?
<p>Let's talk about why we need a data structure and when to use it. We will borrow an example from the world of Wall Street for this discussion.  </p>
<p>Companies listed on the NASDAQ exchange have ticker symbols or abbreviations for each company name. For example, the ticker symbol for Alphabet, Inc. is GOOGL.</p>
<p>Imagine now that you own stocks for one company, say Microsoft, and want to be able to print out the ticker symbol of your stock. Since it is one value,  you can store it in the variable <code>microsoft</code>, and assign it the value of MSFT. Like this:</p>
<p><code>microsoft = MSFT</code> </p>
<p>Well, that's convenient! So, now when you want to print the ticker symbol for the company you hold stocks for, you use the print command.</p>
<pre><code>print(microsoft)
<span class="hljs-prompt">&gt;&gt;</span>&gt; <span class="hljs-constant">MSFT</span>
</code></pre>

<p>Let's now consider that you are an investment fund manager, and you want to print out the stocks (or holdings) you own in an index fund (e.g., <a href="https://www.marketwatch.com/investing/fund/vinix/holdings" target="_blank">Vanguard Institutional Index Fund</a>). An index fund includes stocks (also called holdings) for a large number of companies. Turns out Vanguard Institutional Index Fund has <a href="https://investor.vanguard.com/mutual-funds/profile/VINIX" target="_blank">506 holdings</a>!</p>
<p>Printing the tickets for all 506 holdings using individual strings would require 506 strings. <em>Not ideal!</em> Because we'd need to remember the name of each string to print it.</p>
<p>You also have to think about how to group the 506 strings under the same index fund. <em>Not convenient at all!</em></p>
<p><strong>This is where the beauty of data structures comes into play! You can use a list. </strong></p>
<p>Let's learn more about what a list is on the next page.</p>


# **List and Membership Operators**
There are three sections as a part of this module. Be sure to read them out along with the additional helpful reminders!

## **Lists!**
A list is one of the most common and basic data structures in Python.

You saw here that you can create a list with square brackets. Lists can contain any mix and match of the data types you have seen so far.

`list_of_random_things = [1, 3.4, 'a string', True]`

This is a list of 4 elements. All ordered containers (like lists) are indexed in Python using a starting index of 0. Therefore, to pull the first value from the above list, we can write:

```
>>> list_of_random_things[0]
1
```

It might seem like you can pull the last element with the following code, but this actually won't work:

```
>>> list_of_random_things[len(list_of_random_things)] 
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-34-f88b03e5c60e> in <module>()
----> 1 lst[len(lst)]

IndexError: list index out of range
```

However, you can retrieve the last element by reducing the index by 1. Therefore, you can do the following:

```
>>> list_of_random_things[len(list_of_random_things) - 1] 
True
```

Alternatively, you can index from the end of a list by using negative values, where -1 is the last element, -2 is the second to last element and so on.

```
>>> list_of_random_things[-1] 
True
>>> list_of_random_things[-2] 
a string
```

## **Slice and Dice with Lists**

You saw that we can pull more than one value from a list at a time by using slicing. When using **slicing**, it is important to remember that the `lower` index is `inclusive` and the `upper` index is `exclusive`.

Therefore, this:

```
>>> list_of_random_things = [1, 3.4, 'a string', True]
>>> list_of_random_things[1:2]
[3.4]
```

will only return **3.4** in a list. Notice this is still different than just indexing a single element, because you get a list back with this indexing. The colon tells us to go from the starting value on the left of the colon up to, but not including, the element on the right.

If you know that you want to start at the beginning, of the list you can also leave out this value.

```
>>> list_of_random_things[:2]
[1, 3.4]
```

or to return all of the elements to the end of the list, we can leave off a final element.

```
>>> list_of_random_things[1:]
[3.4, 'a string', True]
```

This type of indexing works exactly the same on strings, where the returned value will be a string.

## **Are You `in` or `not in`?**
We can also use `in` and `not in` to return a **bool** of whether an element exists within our list, or if one string is a substring of another.

```
>>> 'this' in 'this is a string'
True
>>> 'in' in 'this is a string'
True
>>> 'isa' in 'this is a string'
False
>>> 5 not in [1, 2, 3, 4, 6]
True
>>> 5 in [1, 2, 3, 4, 6]
False
```

## **Membership Operators**

<div class="index-module--table-responsive--1zG6k"><table class="index-module--table--8j68C index-module--table-striped--3HHC-">
<thead>
<tr>
<th>Keyword</th>
<th>Operator</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>in</code></td>
<td>evaluates if an element exists within our list</td>
</tr>
<tr>
<td><code>not in</code></td>
<td>evaluates if an element does not exist within our list</td>
</tr>
</tbody>
</table>
</div>


## **Using Lists in the Index Fund Example**
Let's return to our index fund example we discussed earlier. Since index funds have ticker symbols, for example, VINIX for the [Vanguard Institutional Index Fund](https://www.marketwatch.com/investing/fund/vinix/holdings), you can use that as the name for the list. Then we can add the ticker symbols for all the holdings into that list. Let's populate the list with the top holdings listed for [Vanguard Institutional Index Fund](https://www.marketwatch.com/investing/fund/vinix/holdings).

```
VINIX = ['C', 'MA', 'BA', 'PG', 'CSCO', 'VZ', 'PFE', 'HD', 'INTC', 'T', 'V', 'UNH', 'WFC', 'CVX', 'BAC', 'JNJ', 'GOOGL', 'GOOG', 'BRK.B', 'XOM', 'JPM', 'FB', 'AMZN', 'MSFT', 'AAPL']
```

Now, printing the tickers becomes slightly easier. And you don't have to remember the names of the strings!

```
print(VINIX[0])
>>> C
print(VINIX[1])
>>> MA
```

*Later you will learn about more efficient ways to print the elements in a list.*

You can even use the list to see if a particular stock **is in** the index fund `VINIX` or not.

Like this:

```
'GE' in VINIX
>>> False

'GOOGL' in VINIX
>>> True
```

*We'll revisit this example of Wall Street later in the lesson to show how data structures can be even more helpful!*

# **Mutability And Order**
**Mutability** refers to whether or not we can change an object once it has been created. If an object can be changed, it is called **mutable**. However, if an object cannot be changed after it has been created, then the object is considered **immutable**.

Examples - Lists are mutable, and strings are immutable.

```
>>> my_lst = [1, 2, 3, 4, 5]
>>> my_lst[0] = 'one'
>>> print(my_lst)
['one', 2, 3, 4, 5]
```

With the list above, you are able to replace '1 'with 'one'. This is because lists are **mutable**.

However, trying to change the string below does not work:


```
>>> greeting = "Hello there"
>>> greeting[0] = 'M'
```

Running these two lines of code will cause an error: "'str' object does not support item assignment."

This is because strings are **immutable**. This means you can't change a string once it's been created - you will need to instead create a completely new string.

What we mean by this is, using the same example as above, it is perfectly fine to do the following to change the value of the entire string `greeting`:

```
>>> greeting = "Hello there"
>>> greeting = "Mello there"
```

That second line in Python actually creates a new place in memory where the string `greeting` is stored, effectively creating a new string, a new object, even though it has the same name.

There are two things to keep in mind for each of the data types you are using:

1. Are they mutable?
2. Are they ordered?

**Order** is about whether the position of an element in the object can be used to access the element.

**Both strings and lists are ordered.** We can use the order to access parts of a list and string.

However, you will see some data types in the next sections that will be unordered. For each of the upcoming data structures you see, it is useful to understand how you index, are they mutable, and are they ordered. Knowing this about the data structure is really useful!

Additionally, you will see how these each have different methods, so why you would use one data structure vs. another is largely dependent on these properties, and what you can easily do with it!

# **Quiz: List Indexing**

### **QUESTION 1 OF 3**

Use list indexing to determine how many days are in a particular month based on the integer variable `month`, and store that value in the integer variable `num_days`. For example, if `month` is 8, `num_days` should be set to 31, since the eighth month, August, has 31 days.

Remember to account for zero-based indexing!

In [None]:
month = 8
days_in_month = [31,28,31,30,31,30,31,31,30,31,30,31]

# use list indexing to determine the number of days in month
num_days = days_in_month[month - 1]

print("The number of days in the 8th month is '{}' days".format(num_days))

The number of days in the 8th month is '31' days


## **Quiz: Slicing Lists**
Select the three most recent dates from this list using list slicing notation. Hint: negative indexes work in slices!
### **QUESTION 2 OF 3**

In [None]:
eclipse_dates = ['June 21, 2001', 'December 4, 2002', 'November 23, 2003',
                 'March 29, 2006', 'August 1, 2008', 'July 22, 2009',
                 'July 11, 2010', 'November 13, 2012', 'March 20, 2015',
                 'March 9, 2016']
                 
                 
# TODO: Modify this line so it prints the last three elements of the list
print(eclipse_dates[-3:])

['November 13, 2012', 'March 20, 2015', 'March 9, 2016']


### **QUESTION 3 OF 3**
Suppose we have the following two expressions, sentence1 and sentence2:

```
sentence1 = "I wish to register a complaint."
sentence2 = ["I", "wish", "to", "register", "a", "complaint", "."]
```

Find the Python code below with the value of the modified sentence1 or sentence2. If the code results in an error.

A. `sentence2[0:2] = ["We", "want"]`

B. `sentence2[0]= "Our Majesty"`

C. `sentence1[30]="!"`

D. `sentence2[0:2] = ["We", "want"]`



In [None]:
# Write Your Answer Option Below.
#

* `sentence1` is a string, and is therefore an immutable object. That means that while you can refer to individual characters in `sentence1` (e.g., you can write things like `sentence1[5]`) you cannot assign value to them (you cannot write things like `sentence1[5] = 'a'`). Therefore the third expression will result in an error.

* `sentence2` is a list, and lists are mutable, meaning that you can change the value of individual items in `sentence2`:

* In the first expression we changed the value of the last item in `sentence2` from `"."` to `"!"`.
* In the second expression we changed the value of the first item in `sentence2` from `"I` to `"Our Majesty"`.
* In the last expression we used slicing to simultaneously change the value of both the first and the second item in `sentence2` from `"I"` and `"wish"` to `"We"` and `"want"`.

# **Useful Functions for Lists I**
1. `len()` returns how many elements are in a list.
2. `max()` returns the greatest element of the list. How the greatest element is determined depends on what type of objects are in the list. The maximum element in a list of numbers is the largest number. The maximum element in a list of strings is the element that would occur last if the list were sorted alphabetically. This works because the the `max()` function is defined in terms of the greater than comparison operator. The `max()` function is undefined for lists that contain elements from different, incomparable types.
3. `min()` returns the smallest element in a list. min is the opposite of max, which returns the largest element in a list.
4. `sorted()` returns a copy of a list in order from smallest to largest, leaving the list unchanged. Note again that for string objects, sorted smallest to largest means sorting in alphabetical order.

# **Useful Functions for Lists II**
## **`join` method**

Join is a string method that takes a list of strings as an argument, and returns a string consisting of the list elements joined by a separator string.

```
new_str = "\n".join(["fore", "aft", "starboard", "port"])
print(new_str)
```

Output:
```
fore
aft
starboard
port
```

In this example we use the string `"\n"` as the separator so that there is a newline between each element. We can also use other strings as separators with .join. Here we use a hyphen.
```
name = "-".join(["García", "O'Kelly", "Davis"])
print(name)
```

Output:
```
García-O'Kelly-Davis
```
It is important to remember to separate each of the items in the list you are joining with a comma (,). Forgetting to do so will not trigger an error, but will also give you unexpected results.

## **`append` method**
A helpful method called `append` adds an element to the end of a list.
```
letters = ['a', 'b', 'c', 'd']
letters.append('z')
print(letters)
```

Output:
```
['a', 'b', 'c', 'd', 'z']
```

## **Quiz: `len`, `max`, `min`, and Lists**
There is a Python environment for you to run test code at the bottom of this page related to any of the quizzes on this page!-

### QUESTION 1 OF 4
What would the output of the following code be? (Treat the commas in the multiple choice answers as newlines.)
```
a = [1, 5, 8]
b = [2, 6, 9, 10]
c = [100, 200]

print(max([len(a), len(b), len(c)]))
print(min([len(a), len(b), len(c)]))
```


In [None]:
# Type your code here!
a = [1, 5, 8]
b = [2, 6, 9, 10]
c = [100, 200]

print(max([len(a), len(b), len(c)]))
print(min([len(a), len(b), len(c)]))

4
2


## **Quiz: sorted, join, and Lists**
### QUESTION 2 OF 4
What would the output of the following code be? (Treat the commas in the multiple choice answers as newlines.)

```
names = ["Carol", "Albert", "Ben", "Donna"]
print(" & ".join(sorted(names)))
```


In [None]:
# Type your code here!
names = ["Carol", "Albert", "Ben", "Donna"]
print(" & ".join(sorted(names)))

Albert & Ben & Carol & Donna


## **Quiz: `append` and Lists**
### QUESTION 3 OF 4
What would the output of the following code be? (Treat the commas in the multiple choice answers as newlines.)
```
names = ["Carol", "Albert", "Ben", "Donna"]
names.append("Eugenia")
print(sorted(names))
```


In [None]:
names = ["Carol", "Albert", "Ben", "Donna"]
names.append("Eugenia")
print(sorted(names))

['Albert', 'Ben', 'Carol', 'Donna', 'Eugenia']


# **Tuples**
A tuple is another useful container. It's a data type for immutable ordered sequences of elements. They are often used to store related pieces of information. Consider this example involving latitude and longitude:
```
location = (13.4125, 103.866667)
print("Latitude:", location[0])
print("Longitude:", location[1])
```

Tuples are similar to lists in that they store an ordered collection of objects which can be accessed by their indices. Unlike lists, however, tuples are immutable - you can't add and remove items from tuples, or sort them in place.

Tuples can also be used to assign multiple variables in a compact way.
```
dimensions = 52, 40, 100
length, width, height = dimensions
print("The dimensions are {} x {} x {}".format(length, width, height))
```

The parentheses are optional when defining tuples, and programmers frequently omit them if parentheses don't clarify the code.

In the second line, three variables are assigned from the content of the tuple dimensions. This is called **tuple unpacking**. You can use tuple unpacking to assign the information from a tuple into multiple variables without having to access them one by one and make multiple assignment statements.

If we won't need to use `dimensions` directly, we could shorten those two lines of code into a single line that assigns three variables in one go!
```
length, width, height = 52, 40, 100
print("The dimensions are {} x {} x {}".format(length, width, height))
```



In [None]:
location = (13.4125, 103.866667)
print("Latitude:", location[0])
print("Longitude:", location[1])

Latitude: 13.4125
Longitude: 103.866667


What would the output of the following code be? (Treat the commas in the multiple choice answers as newlines.)
```
tuple_a = 1, 2
tuple_b = (1, 2)

print(tuple_a == tuple_b)
print(tuple_a[1])
```



In [None]:
tuple_a = 1, 2
tuple_b = (1, 2)

print(tuple_a == tuple_b)
print(tuple_a[1])

True
2


# **Sets**
A set is a data type for mutable unordered collections of unique elements. One application of a set is to quickly remove duplicates from a list.
```
numbers = [1, 2, 6, 3, 1, 1, 6]
unique_nums = set(numbers)
print(unique_nums)
```

This would output:

`{1, 2, 3, 6}`

Sets support the `in` operator the same as lists do. You can add elements to sets using the add method, and remove elements using the pop method, similar to lists. Although, when you `pop` an element from a set, a random element is removed. Remember that sets, unlike lists, are unordered so there is no "last element".

```
fruit = {"apple", "banana", "orange", "grapefruit"}  # define a set

print("watermelon" in fruit)  # check for element

fruit.add("watermelon")  # add an element
print(fruit)

print(fruit.pop())  # remove a random element
print(fruit)
```

This outputs:
```
False
{'grapefruit', 'orange', 'watermelon', 'banana', 'apple'}
grapefruit
{'orange', 'watermelon', 'banana', 'apple'}
```

Other operations you can perform with sets include those of mathematical sets. Methods like union, intersection, and difference are easy to perform with sets, and are much faster than such operators with other containers.

In [None]:
fruit = {"apple", "banana", "orange", "grapefruit"}  # define a set

print("watermelon" in fruit)  # check for element

fruit.add("watermelon")  # add an element
print(fruit)

print(fruit.pop())  # remove a random element
print(fruit)

False
{'orange', 'apple', 'grapefruit', 'watermelon', 'banana'}
orange
{'apple', 'grapefruit', 'watermelon', 'banana'}


In [None]:
numbers = [1, 2, 6, 3, 1, 1, 6]
unique_nums = set(numbers)
print(unique_nums)

{1, 2, 3, 6}


## **Quiz: list to set**
### QUESTION 1 OF 3
What would the output of the following code be?
```
a = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
b = set(a)
print(len(a) - len(b))
```

In [None]:
a = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
b = set(a)
print(b)
print(len(a))
print(len(a) - len(b))

{1, 2, 3, 4}
10
6


## **Quiz: add and pop**
### QUESTION 2 OF 3
Consider the following code:
```
a = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
b = set(a)
b.add(5)
b.pop()
```

After executing this code, will the number 5 be a part of the set b?

In [None]:
a = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
b = set(a)
b.add(5)
b.pop()
print(b)

{2, 3, 4, 5}


Answer is neither YES nor NO
When you pop an element from a set a random element is removed (remember that sets, unlike lists, are unordered so there is no "last element"). The number 5 may or may not be removed.

# **Dictionaries And Identity Operators**
### Dictionaries
A dictionary is a mutable data type that stores mappings of unique keys to values. Here's a dictionary that stores elements and their atomic numbers.
```
elements = {"hydrogen": 1, "helium": 2, "carbon": 6}
```

In general, dictionaries look like key-value pairs, separated by commas:

`{key1:value1, key2:value2, key3:value3, key4:value4, ...}`

Dictionaries are mutable, but their keys need to be any immutable type, like strings, integers, or tuples. It's not even necessary for every key in a dictionary to have the same type! For example, the following dictionary is perfectly valid:

`random_dict = {"abc": 1, 5: "hello"}`

This dictionary has two keys: "abc" and 5. The first key has a string type, and the second key has an integer type. And the dictionary has two values: 1 and "hello".

We can look up values in the dictionary using square brackets `"[]" `around the key, like :

`dict_name[key]`.

For example, in our random dictionary above, the value for `random_dict["abc"]` is `1`, and the value for `random_dict[5]` is `"hello"`.

In our elements dictionary above, we could print out the atomic number mapped to helium like this:

`print(elements["helium"])`

This would print out 2.

We can also insert a new element into a dictionary as in this example:

`elements["lithium"] = 3`

If we then executed `print(elements)`, the output would be:

`{'hydrogen': 1, 'carbon': 6, 'helium': 2, 'lithium': 3}`

This illustrates how dictionaries are mutable.

What if we try to look up a key that is not in our dictionary, using the square brackets, like `elements['dilithium']`? This will give you a `"KeyError"`.

We can check whether a key is in a dictionary the same way we check whether an element is in a list or set, using the `in` keyword. Dictionaries have a related method that's also useful, `get`. `get` looks up values in a dictionary, but unlike square brackets, `get` returns None (or a default value of your choice) if the key isn't found.
```
print("carbon" in elements)
print(elements.get("dilithium"))
```

This would output:
```
True
None
```

`"carbon"` is in the dictionary, so `True` is printed. `"dilithium"` isn’t in our dictionary so `None` is returned by `get` and then printed. So if you expect lookups to sometimes fail, `get` might be a better tool than normal square bracket lookups, because errors can crash your program.

### **Identity Operators**
<div class="index-module--table-responsive--1zG6k"><table class="index-module--table--8j68C index-module--table-striped--3HHC-">
<thead>
<tr>
<th>Keyword</th>
<th>Operator</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>is</code></td>
<td>evaluates if both sides have the same identity</td>
</tr>
<tr>
<td><code>is not</code></td>
<td>evaluates if both sides have different identities</td>
</tr>
</tbody>
</table>
</div>

You can check if a key returned None with the `is` operator. You can check for the opposite using `is not`.
```
n = elements.get("dilithium")
print(n is None)
print(n is not None)
```

This would output:
```
True
False
```

In [None]:
elements = {"hydrogen": 1, "helium": 2, "carbon": 6, "lithuim": 4}

In [None]:
if elements.get("lithuim"):
  print("Lithium exists")
else:
  print("Not available")

Lithium exists


In [None]:
n = elements.get("dilithium")
print(n is None)
print(n is not None)

True
False


# **Quiz: Define a Dictionary**
Define a dictionary named population that contains this data:

<div class="_3iP2TrDXtmCaMBMDFgM-Os"><table class="_31OI8Tmq3yywD_20_EuBtM _1pZkqGvd1RdMO5JXUl_2oA">
<thead>
<tr>
<th><strong>Keys</strong></th>
<th><strong>Values</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Shanghai</td>
<td>17.8</td>
</tr>
<tr>
<td>Istanbul</td>
<td>13.3</td>
</tr>
<tr>
<td>Karachi</td>
<td>13.0</td>
</tr>
<tr>
<td>Mumbai</td>
<td>12.5</td>
</tr>
</tbody>
</table>
</div>
<br>
<br>

In [None]:
# Define a Dictionary, population,
# that provides information
# on the world's largest cities.
# The key is the name of a city
# (a string), and the associated
# value is its population in
# millions of people.

#   Key     |   Value
# Shanghai  |   17.8
# Istanbul  |   13.3
# Karachi   |   13.0
# Mumbai    |   12.5

population = {"Shanghai": 17.8, "Istanbul": 13.3, "Karachi": 13.0, "Mumbai": 12.5}
print(population)

{'Shanghai': 17.8, 'Istanbul': 13.3, 'Karachi': 13.0, 'Mumbai': 12.5}


## **`get` with a Default Value**
Dictionaries have a related method that's also useful, `get()`. `get()` looks up values in a dictionary, but unlike looking up values with square brackets, `get()` returns `None` (or a default value of your choice) if the key isn't found. If you expect lookups to sometimes fail, `get()` might be a better tool than normal square bracket lookups.
```
>>> elements.get('dilithium')
None
>>> elements['dilithium']
KeyError: 'dilithium'
>>> elements.get('kryptonite', 'There\'s no such element!')
"There's no such element!"
```

In the last example we specified a default value (the string 'There's no such element!') to be returned instead of `None` when the key is not found.

## **Checking for Equality vs. Identity: `==` vs. `is`**

### QUESTION
What will the output of the following code be? (Treat the commas in the multiple choice answers as newlines.)
```
a = [1, 2, 3]
b = a
c = [1, 2, 3]

print(a == b)
print(a is b)
print(a == c)
print(a is c)
```



In [None]:
a = [1, 2, 3]
b = a
c = [1, 2, 3]

print(a == b)
print(a is b)
print(a == c)
print(a is c)

True
True
True
False


List `a` and list `b` are equal and identical. List `c` is equal to `a` (and `b` for that matter) since they have the same contents. But `a` and `c` (and `b` for that matter, again) point to two different objects, i.e., they aren't identical objects. That is the difference between checking for equality vs. identity.

# **When to use Dictionaries?**
Let's revisit our Wall Street example from before. Previously we created a list for the index fund, [Vanguard Institutional Index Fund](https://www.marketwatch.com/investing/fund/vinix/holdings), because we wanted to print the names of the holdings (or stocks) in the index fund.

Now, let's say as the investment fund manager for VINIX, you also want to print a few more details for each holding. For e.g., what is your rate of return on each of the holdings?

A dictionary will work well here as there is a key: value association. In other words, there is a linkage between each holding and the information (e.g., rate of return), and it can be organized under one index fund, VINIX.
```
VINIX =  {'C': 0.74, 'MA': 0.78, 'BA': 0.79, 'PG': 0.85, 'CSCO': 0.88, 'VZ': 0.9, 'PFE': 0.92, 'HD': 0.97, 'INTC': 1.0, 'T': 1.01, 'V': 1.02, 'UNH': 1.02, 'WFC': 1.05, 'CVX': 1.05, 'BAC': 1.15, 'JNJ': 1.41, 'GOOGL': 1.46, 'GOOG': 1.47, 'BRK.B': 1.5, 'XOM': 1.52, 'JPM': 1.53, 'FB': 2.02, 'AMZN': 2.96, 'MSFT': 3.28, 'AAPL': 3.94}
```

You can add even other details, such as rate of return YTD. For that we can add the details into the value associated with the key, i.e., the ticker symbol for the holding.
```
VINIX = {'C': [0.74, -6.51],  'MA': [0.78, 34.77],  'BA': [0.79, 17.01],  'PG': [0.85, -8.81],  'CSCO': [0.88, 18.56],  'VZ': [0.9, 2.16],  'PFE': [0.92, 13.96],  'HD': [0.97, 3.2],  'INTC': [1.0, 2.61],  'T': [1.01, -15.19],  'V': [1.02, 24.0],  'UNH': [1.02, 19.32],  'WFC': [1.05, -3.59],  'CVX': [1.05, -5.77],  'BAC': [1.15, 4.27],  'JNJ': [1.41, -5.58],  'GOOGL': [1.46, 17.84],  'GOOG': [1.47, 17.03],  'BRK.B': [1.5, 4.54],  'XOM': [1.52, -6.87],  'JPM': [1.53, 7.66],  'FB': [2.02, 0.91], 'AMZN': [2.96, 62.75], 'MSFT': [3.28, 26.61], 'AAPL': [3.94, 26.01]}
```

As you can see, data structures are very useful in collecting, storing and working with more information than simple strings or integers.

You will soon learn how to use dictionary methods to perform tasks, such as pull values from keys, sort values by keys, add values to the dictionary, and many other tasks that make data structures critical for data science.

# **Check for Understanding**

A **tuple** is an immutable, ordered data structure that can be indexed and sliced like a list. Tuples are defined by listing a sequence of elements separated by commas, optionally contained within parentheses: `()`.

A **set** is a mutable data structure - you can modify the elements in a set with methods like `add` and `pop`. A set is an unordered data structure, so you can't index and slice elements like a list; there is no sequence of positions to index with!

One of the key properties of a set is that it only contains **unique** elements. So even if you create a new set with a list of elements that contains duplicates, Python will remove the duplicates when creating the set automatically.

A set is defined with curly braces, `{}`, but it isn't the only data structure that does; dictionaries do as well! However, the difference is that a set is defined as a sequence of elements separated by commas:
`set_example = {element1, element2, element3}`
while a dictionary is defined as a sequence of key, value pairs marked with colons, separated by commas:
`dict_example = {key1: value1, key2: value2, key3: value3}`.

**Note:** if you define a variable with an empty set of curly braces like this: `a = {}`, Python will assign an empty dictionary to that variable. You can always use `set()` and `dict()` to define empty sets and dictionaries as well.

A dictionary is a mutable, unordered data structure that contains mappings of keys to values. Because these keys are used to index values, they must be unique and immutable. For example, a string or tuple can be used as the key of a dictionary, but if you try to use a list as a key of a dictionary, you will get an error.

## **Identify the Problem**
Run the code below - it should break. Take a look at the error message and try to figure out what the issue is. Then, answer the quiz question below the editor.
```
# invalid dictionary - this should break
room_numbers = {
    ['Freddie', 'Jen']: 403,
    ['Ned', 'Keith']: 391,
    ['Kristin', 'Jazzmyne']: 411,
    ['Eugene', 'Zach']: 395
}
```

It should return the error:
```
Traceback (most recent call last):
  File "vm_main3.py", line 47, in <module>
    import main
  File "/tmp/vmuser_dntlxdbxvx/main.py", line 2, in <module>
    import studentMain
  File "/tmp/vmuser_dntlxdbxvx/studentMain.py", line 1, in <module>
    import dictionary_keys
  File "/tmp/vmuser_dntlxdbxvx/dictionary_keys.py", line 3, in <module>
    ['Freddie', 'Jen']: 403,
TypeError: unhashable type: 'list'
```

The lists used in the code above are mutable, and thus cannot be hashed and used as dictionary keys.
Can you try modifying the data type of the keys in the dictionary above to make the code run without errors? Hint: What other data structure can you use to store a sequence of values, but is immutable?

# **Compound Data Structures**
We can include containers in other containers to create compound data structures. For example, this dictionary maps keys to values that are also dictionaries!
```
elements = {"hydrogen": {"number": 1,
                         "weight": 1.00794,
                         "symbol": "H"},
              "helium": {"number": 2,
                         "weight": 4.002602,
                         "symbol": "He"}}
```

We can access elements in this nested dictionary like this.
```
helium = elements["helium"]  # get the helium dictionary
hydrogen_weight = elements["hydrogen"]["weight"]  # get hydrogen's weight
```

You can also add a new key to the element dictionary.
```
oxygen = {"number":8,"weight":15.999,"symbol":"O"}  # create a new oxygen dictionary 
elements["oxygen"] = oxygen  # assign 'oxygen' as a key to the elements dictionary
print('elements = ', elements)
```

Output is:
```
elements =  {"hydrogen": {"number": 1,
                          "weight": 1.00794,
                          "symbol": 'H'},
               "helium": {"number": 2,
                          "weight": 4.002602,
                          "symbol": "He"}, 
               "oxygen": {"number": 8, 
                          "weight": 15.999, 
                          "symbol": "O"}}
```



# Quiz: Adding Values to Nested Dictionaries
Try your hand at working with nested dictionaries. Add another entry, `'is_noble_gas,'` to each dictionary in the `elements` dictionary. After inserting the new entries you should be able to perform these lookups:

```
>>> print(elements['hydrogen']['is_noble_gas'])
False
>>> print(elements['helium']['is_noble_gas'])
True
```


In [None]:
elements = {'hydrogen': {'number': 1, 'weight': 1.00794, 'symbol': 'H'},
            'helium': {'number': 2, 'weight': 4.002602, 'symbol': 'He'}}

# todo: Add an 'is_noble_gas' entry to the hydrogen and helium dictionaries
# hint: helium is a noble gas, hydrogen isn't


In [None]:
elements = {'hydrogen': {'number': 1, 'weight': 1.00794, 'symbol': 'H'},
            'helium': {'number': 2, 'weight': 4.002602, 'symbol': 'He'}}

elements['hydrogen']['is_noble_gas'] = False
elements['helium']['is_noble_gas'] = True

Notice the last two lines are the solution necessary to add the `is_noble_gas` key to each of the dictionaries, so the final result would be correct after running these two lines.

# **Practice Questions**
The following questions are based on the same text you saw in the last lesson, the first verse of the poem If by Rudyard Kipling. We've converted all letters to lowercase, removed punctuation marks from the text, and stored this modified text in the string variable verse.

## **Quiz: Count Unique Words**
Your task for this quiz is to find the number of unique words in the text. In the code editor below, complete these three steps to get your answer.

1. Split `verse` into a list of words. **Hint**: You can use a string method you learned in the previous lesson.
2. Convert the list into a data structure that would keep only the unique elements from the list.
3. Print the length of the container.

```
verse = "if you can keep your head when all about you are losing theirs and blaming it on you   if you can trust yourself when all men doubt you     but make allowance for their doubting too   if you can wait and not be tired by waiting      or being lied about  don’t deal in lies   or being hated  don’t give way to hating      and yet don’t look too good  nor talk too wise"
print(verse, '\n')

# split verse into list of words
verse_list =
print(verse_list, '\n')

# convert list to a data structure that stores unique elements
verse_set =
print(verse_set, '\n')

# print the number of unique words
num_unique = 
print(num_unique, '\n')
```

In [None]:
verse = "if you can keep your head when all about you are losing theirs and blaming it on you   if you can trust yourself when all men doubt you     but make allowance for their doubting too   if you can wait and not be tired by waiting      or being lied about  don’t deal in lies   or being hated  don’t give way to hating      and yet don’t look too good  nor talk too wise"
print(verse, "\n")

# split verse into list of words
verse_list = verse.split()
print(verse_list, '\n')

# convert list to set to get unique words
verse_set = set(verse_list)
print(verse_set, '\n')

# print the number of unique words
num_unique = len(verse_set)
print(num_unique)

if you can keep your head when all about you are losing theirs and blaming it on you   if you can trust yourself when all men doubt you     but make allowance for their doubting too   if you can wait and not be tired by waiting      or being lied about  don’t deal in lies   or being hated  don’t give way to hating      and yet don’t look too good  nor talk too wise 

['if', 'you', 'can', 'keep', 'your', 'head', 'when', 'all', 'about', 'you', 'are', 'losing', 'theirs', 'and', 'blaming', 'it', 'on', 'you', 'if', 'you', 'can', 'trust', 'yourself', 'when', 'all', 'men', 'doubt', 'you', 'but', 'make', 'allowance', 'for', 'their', 'doubting', 'too', 'if', 'you', 'can', 'wait', 'and', 'not', 'be', 'tired', 'by', 'waiting', 'or', 'being', 'lied', 'about', 'don’t', 'deal', 'in', 'lies', 'or', 'being', 'hated', 'don’t', 'give', 'way', 'to', 'hating', 'and', 'yet', 'don’t', 'look', 'too', 'good', 'nor', 'talk', 'too', 'wise'] 

{'are', 'by', 'waiting', 'tired', 'good', 'can', 'your', 'blaming', '

# **Quiz: Verse Dictionary**
In the code editor below, you'll find a dictionary containing the unique words of `verse` stored as keys and the number of times they appear in `verse` stored as values. Use this dictionary to answer the following questions. Submit these answers in the quiz below the code editor.

Try to answer these using code, rather than inspecting the dictionary manually!

1. How many unique words are in `verse_dict`?
2. Is the key `"breathe"` in `verse_dict`?
3. What is the first element in the list created when `verse_dict` is sorted by keys? **Hint**: Use the appropriate dictionary method to get a list of its keys, and then sort that list. Use this list of keys to answer the next two questions as well.
4. Which key (word) has the highest value in `verse_dict`?


```
verse_dict =  {'if': 3, 'you': 6, 'can': 3, 'keep': 1, 'your': 1, 'head': 1, 'when': 2, 'all': 2, 'about': 2, 'are': 1, 'losing': 1, 'theirs': 1, 'and': 3, 'blaming': 1, 'it': 1, 'on': 1, 'trust': 1, 'yourself': 1, 'men': 1, 'doubt': 1, 'but': 1, 'make': 1, 'allowance': 1, 'for': 1, 'their': 1, 'doubting': 1, 'too': 3, 'wait': 1, 'not': 1, 'be': 1, 'tired': 1, 'by': 1, 'waiting': 1, 'or': 2, 'being': 2, 'lied': 1, 'don\'t': 3, 'deal': 1, 'in': 1, 'lies': 1, 'hated': 1, 'give': 1, 'way': 1, 'to': 1, 'hating': 1, 'yet': 1, 'look': 1, 'good': 1, 'nor': 1, 'talk': 1, 'wise': 1}
print(verse_dict, '\n')


# find number of unique keys in the dictionary
num_keys = 
print(num_keys)

# find whether 'breathe' is a key in the dictionary
contains_breathe = 
print(contains_breathe)

# create and sort a list of the dictionary's keys
sorted_keys = 

# get the first element in the sorted list of keys
print()

# find the element with the highest value in the list of keys
print() 
```



In [None]:
verse_dict =  {'if': 3, 'you': 6, 'can': 3, 'keep': 1, 'your': 1, 'head': 1, 'when': 2, 'all': 2, 'about': 2, 'are': 1, 'losing': 1, 'theirs': 1, 'and': 3, 'blaming': 1, 'it': 1, 'on': 1, 'trust': 1, 'yourself': 1, 'men': 1, 'doubt': 1, 'but': 1, 'make': 1, 'allowance': 1, 'for': 1, 'their': 1, 'doubting': 1, 'too': 3, 'wait': 1, 'not': 1, 'be': 1, 'tired': 1, 'by': 1, 'waiting': 1, 'or': 2, 'being': 2, 'lied': 1, 'don\'t': 3, 'deal': 1, 'in': 1, 'lies': 1, 'hated': 1, 'give': 1, 'way': 1, 'to': 1, 'hating': 1, 'yet': 1, 'look': 1, 'good': 1, 'nor': 1, 'talk': 1, 'wise': 1}
print(verse_dict, '\n')

# find number of unique keys in the dictionary
num_keys = len(verse_dict)
print(num_keys)

# find whether 'breathe' is a key in the dictionary
contains_breathe = "breathe" in verse_dict
print(contains_breathe)

# create and sort a list of the dictionary's keys
sorted_keys = sorted(verse_dict.keys())

# get the first element in the sorted list of keys
print(sorted_keys[0])

# find the element with the highest value in the list of keys
print(sorted_keys[-1]) 

{'if': 3, 'you': 6, 'can': 3, 'keep': 1, 'your': 1, 'head': 1, 'when': 2, 'all': 2, 'about': 2, 'are': 1, 'losing': 1, 'theirs': 1, 'and': 3, 'blaming': 1, 'it': 1, 'on': 1, 'trust': 1, 'yourself': 1, 'men': 1, 'doubt': 1, 'but': 1, 'make': 1, 'allowance': 1, 'for': 1, 'their': 1, 'doubting': 1, 'too': 3, 'wait': 1, 'not': 1, 'be': 1, 'tired': 1, 'by': 1, 'waiting': 1, 'or': 2, 'being': 2, 'lied': 1, "don't": 3, 'deal': 1, 'in': 1, 'lies': 1, 'hated': 1, 'give': 1, 'way': 1, 'to': 1, 'hating': 1, 'yet': 1, 'look': 1, 'good': 1, 'nor': 1, 'talk': 1, 'wise': 1} 

51
False
about
yourself


A good understanding of data structures is integral for programming and data analysis. As a data analyst, you will be working with data and code all the time, so a solid understanding of what data types and data structures are available and when to use each one will help you write more efficient code.

Remember, you can get more practice on sites like [HackerRank](https://www.hackerrank.com/domains/python).

In this lesson, we covered four important data structures in Python:

<div class="_3iP2TrDXtmCaMBMDFgM-Os"><table class="_31OI8Tmq3yywD_20_EuBtM _1pZkqGvd1RdMO5JXUl_2oA">
<thead>
<tr>
<th><strong>Data Structure</strong></th>
<th><strong>Ordered</strong></th>
<th><strong>Mutable</strong></th>
<th><strong>Constructor</strong></th>
<th><strong>Example</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>List</td>
<td>Yes</td>
<td>Yes</td>
<td><code>[ ]</code> or <code>list()</code></td>
<td><code>[5.7, 4, 'yes', 5.7]</code></td>
</tr>
<tr>
<td>Tuple</td>
<td>Yes</td>
<td>No</td>
<td><code>( )</code> or <code>tuple()</code></td>
<td><code>(5.7, 4, 'yes', 5.7)</code></td>
</tr>
<tr>
<td>Set</td>
<td>No</td>
<td>Yes</td>
<td><code>{}</code>* or <code>set()</code></td>
<td><code>{5.7, 4, 'yes'}</code></td>
</tr>
<tr>
<td>Dictionary</td>
<td>No</td>
<td>No**</td>
<td><code>{ }</code> or <code>dict()</code></td>
<td><code>{'Jun': 75, 'Jul': 89}</code></td>
</tr>
</tbody>
</table>
</div>

* You can use curly braces to define a set like this: `{1, 2, 3}`. However, if you leave the curly braces empty like this: `{}` Python will instead create an empty dictionary. So to create an empty set, use `set()`.
* A dictionary itself is mutable, but each of its individual keys must be immutable.

## **Collections**
When we have a group of data we can think about it as a collection (of data elements). In this lesson, we have seen many different data structures that Python provides for storing, accessing and manipulating collections of data. In particular, we have seen lists, sets, and dictionaries.



*    Lists are sortable, you can add an item to a list with .append and list items are always indexed with numbers starting at 0.

*    Sets are not ordered, so the order in which items appear can be inconsistent, and you add items to sets with `.add`. Like dictionaries and lists, sets are mutable.
No item can appear more than once in a set and you cannot sort sets. For duplication and sorting, a list would be more appropriate.

*     Each item in a dictionary contains two parts (a key and a value), the items in a dictionary are not ordered, and we have seen in this lesson examples of nested dictionaries.
Because dictionaries are not ordered, they are not sortable. And you do not add items to a dictionary with `.append`.


In [None]:
verse_dict =  {'if': 3, 'you': 6, 'can': 3, 'keep': 1, 'your': 1, 'head': 1, 'when': 2, 'all': 2, 'about': 2, 'are': 1, 'losing': 1, 'theirs': 1, 'and': 3, 'blaming': 1, 'it': 1, 'on': 1, 'trust': 1, 'yourself': 1, 'men': 1, 'doubt': 1, 'but': 1, 'make': 1, 'allowance': 1, 'for': 1, 'their': 1, 'doubting': 1, 'too': 3, 'wait': 1, 'not': 1, 'be': 1, 'tired': 1, 'by': 1, 'waiting': 1, 'or': 2, 'being': 2, 'lied': 1, 'don\'t': 3, 'deal': 1, 'in': 1, 'lies': 1, 'hated': 1, 'give': 1, 'way': 1, 'to': 1, 'hating': 1, 'yet': 1, 'look': 1, 'good': 1, 'nor': 1, 'talk': 1, 'wise': 1}
print(verse_dict, '\n')

# find number of unique keys in the dictionary
num_keys = len(set(verse_dict))
print(num_keys)

# find whether 'breathe' is a key in the dictionary
contains_breathe = verse_dict.get("breathe")
print(contains_breathe)

{'if': 3, 'you': 6, 'can': 3, 'keep': 1, 'your': 1, 'head': 1, 'when': 2, 'all': 2, 'about': 2, 'are': 1, 'losing': 1, 'theirs': 1, 'and': 3, 'blaming': 1, 'it': 1, 'on': 1, 'trust': 1, 'yourself': 1, 'men': 1, 'doubt': 1, 'but': 1, 'make': 1, 'allowance': 1, 'for': 1, 'their': 1, 'doubting': 1, 'too': 3, 'wait': 1, 'not': 1, 'be': 1, 'tired': 1, 'by': 1, 'waiting': 1, 'or': 2, 'being': 2, 'lied': 1, "don't": 3, 'deal': 1, 'in': 1, 'lies': 1, 'hated': 1, 'give': 1, 'way': 1, 'to': 1, 'hating': 1, 'yet': 1, 'look': 1, 'good': 1, 'nor': 1, 'talk': 1, 'wise': 1} 

51
None
