# Introduction to Advanced Python Data Types

## Learning Goals

  - Objects and data containers
  - Specific Python containers
    - Strings
    - Lists
    - Tuples
    - Sets
    - Dictionaries
   

In the previous tutorial, we focused on Python's simple data types, like `int`, `float`, and `bool`. In general, these simple object types store individual values (e.g, a = 10). Today, we will talk about objects that can contain multiple values of data, such as `lists` and `tuples` and, yes, `strs` which we've already met.  

All of these data types, whether they hold single values or multiple things, have one important aspect in common: they are all "**objects**" in Python. When we do an assignment, like `py = 3.14`, Python creates an object in your computer's memory, in this case of type `float` and then tags it with the label `py`. The object contains the value (3.14 in this case), and some other goodies we'll learn about later. The tag is not the object and vice versa. The tag, what we'll often call the "variable name", is the way we grab the value stored in the object in order to use it.

### Python Strings

Even though we covered them in the last tutorial, a `str` is a little bit different than, say, an `int`. While an `int` contains a single value, and `str` actually contains a bunch of characters, or strings of length one. For example, let's make `str` tagged `mystring``:

In [1]:
mystring = 'This string contains letters, spaces, etc., and 68 total characters!'
print(mystring)
print("mystring is a ", type(mystring))

This string contains letters, spaces, etc., and 68 total characters!
mystring is a  <class 'str'>


While normal people care about the words and the punctuation, computers don't, Python doesn't, and nor often do programmers. To Python, `mystring` has no meaning, it's just an arbitrary sequence of little things (letters, spaces, etc.). So another way to look at this object is to see how many things it contains, and we can do this with the `len()` (length) function:

In [2]:
len(mystring)

68

So `mystring` actually contains 68 things: each letter, space, numerical digit, and punctuation mark, generally called "characters".

We can even look inside our string using "**indexing**" (much more on this both later in this tutorial and in the weeks and months to follow). We index using square brackets `[ ]` so, for example, try this:

In [3]:
mystring[3]

's'

Here, `mystring[3]` gave us one of the *elements* of `mystring`.

Python `strings`, unlike `int`s or `float`s, are a type of [*container*](https://en.wikipedia.org/wiki/Container_(abstract_data_type)). Containers are data types that can contain multiple objects. Whereas a variable has a name, like `a`, attached to a single value, like `2`, containers contain multiple values and often (as we will see later) contain variables of different types (`int` and `str` etc). 

In fact, the very reason `mystring` has a  *length*, which we got with the `len()` function, is because it is a container. A `int`, not being a container, doesn't have a length. Let's verify this:

In [4]:
myint = 42
len(myint)

TypeError: object of type 'int' has no len()

Whoops, we get an error! Because an `int` is always a *single* object, Python sees no need to keep track of its length.

Containers are primary method that we use in Python to handle data (as real data aren't often single values). Later on, we will learn about specialized containers for data science, like (the [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)). 

First, however, we will learn about the built-in Python containers: *strings*, *lists*, *tuples*, *sets*, and *dictionaries*.

#### Simple indexing

As we saw above, a string is a sequence of characters, generally a human-readable chunk of text.

In [5]:
mystring = 'An ordered sequence of characters!'

This is a  *sequence* so, by definition, it has an *order*. (The tiles that make a word on a Scrabble board is an example of a sequence, while the tiles in the Scrabble bag is an example of an unordered collection.)  We can pluck out individual characters or "elements" by indexing, as we saw briefly above.

In [None]:
mystring[6]

A single element of a string is still a string:

In [8]:
a_char = mystring[6]
type(a_char)

str

And since it's still a string, it still has a length:

In [9]:
len(a_char)

1

So any string of length > 1 is really a container of other strings of length 1 – how meta!

---

In the cell below, try a `mystring[1]`

In [10]:
mystring[1]

'n'

Was that what you expected?

---

Indexing in Python is *zero-based*. So what you might expect to be the first element of something is actually the "zeroith" element. The way to think about it is that the "index" is really an *offset* from the beginning of the container, i.e. the first element. So `mystring[1]` is saying "Whatever is 1 element over from the beginning of the string", which is an "**n**".

As we mentioned before, we've got *much* more indexing to go – it's a huge part of data science!

#### Strings are immutable

Like some other data containers we'll meet below, strings are *immutable* – so once you create one, you cannot change it. For example, based on what we just learned, you might reasonably think that, if we can get a specific value using indexing with `[index]`, we should be able to set a value the same way. Let's try:

In [11]:
mystring[0] = '1'

TypeError: 'str' object does not support item assignment

That throws an error, and Python tells us that strings do not support assignment (directly setting values).

Instead, if you want to "change" your string, you need to make a new string, which is a *new* immutable container object.

In [15]:
mystring = 'Another sequence of characters'

The tag "mystring" is now tied to a *new object* containing "Another sequence of characters". What happed to the object containing "This string contains letters, spaces, etc., and 68 total characters!"? It's still in your computer's memory, but it has no tag tied to it. But don't be sad! You can tie a new tag to it just by assigning one to it:

In [16]:
new_tag_same_object = "This string contains letters, spaces, etc., and 68 total characters!"

### Python Lists

#### Lists are containers of indexed, ordered and *mutable* data. 

Lists can be used to contain multiple elements of *any* kind. Lists are created by writing a set of values inside square brakets:

In [17]:
mylist = [2, 3, 4, 5] # this list contains 4 numbers
print(mylist)

[2, 3, 4, 5]


#### Lists can contain elements of any type

In [18]:
list_of_int = [10, 4, 2, 5]                                  # This list contains integers
print(list_of_int)
type(list_of_int)

[10, 4, 2, 5]


list

In [19]:
list_of_float = [2.3, 4.3, 5.5, 6.1]                         # This list contains floating point numbers
print(list_of_float)
type(list_of_float)

[2.3, 4.3, 5.5, 6.1]


list

In [20]:
list_of_boolean = [True, False, True, True]                  # This list contains boolean values 
print(list_of_boolean)
type(list_of_boolean)

[True, False, True, True]


list

In [21]:
list_of_strings = ['this', 'is', 'a', 'list','of','strings'] # This list contains strings
print(list_of_strings)
type(list_of_strings)

['this', 'is', 'a', 'list', 'of', 'strings']


list

Notice that, because strings are containers, `list_of_strings` is actually a *container of containers*!

---

In the cell below, see if Python will let you create a list with elements of *different types*:

In [25]:
diff_types_string = [1, '3', 'Hi :)']
print(diff_types_string)

[1, '3', 'Hi :)']


---

#### Lists are mutable

As mentioned in passing above, unlike strings, a list is *mutable*, meaning that we can change the elements if you want:

Let's remind ourselves what our list of `floats` was:

In [26]:
list_of_float

[2.3, 4.3, 5.5, 6.1]

Now let's try to change one of the elements by indexing it:

In [27]:
list_of_float[3] = 10.1
list_of_float

[2.3, 4.3, 5.5, 10.1]

So, unlike a `str`, a `list` will allow you to change its elements!

Also, remember the distinction between objects themselves and their tags. We gave the above `list` the tag `list_of_float`, but that tag is just a description for human readers. To Python, that's just an arbitrary tag. So, if we wished, we could change one of the elements of `list_of_float` to a `str` like this:

In [28]:
list_of_float[3] = "I'm not a float"
list_of_float

[2.3, 4.3, 5.5, "I'm not a float"]

Now our "`list_of_float`" contains a `str`, which we can verify:

In [30]:
print(type(list_of_float[3]))

<class 'str'>
<class 'list'>


#### Lists can contain lists

Finally (for now), one crucial thing about lists is that *lists can contain lists!* So if we do the following:

In [31]:
multi_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
multi_list

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Now `multi_list` is a list of lists; we can get the first list like this:

In [32]:
multi_list[0]

[1, 2, 3]

Now, what if we wanted the 2nd element of the first list? We could do it using a "temporary variable" like this:

In [33]:
tmp = multi_list[0]  # get the first list in multi_list, and store it in "tmp"
tmp[1]               # get the second element of the list in "tmp"

2

But Python lets us do this in one go like this:

In [34]:
multi_list[0][1]

2

What's happening here is that Python first evaluates `multi_list[0]` to give you the first list in `multi_list`, and then the `[1]` gives you the second element of that list.

---

In the cell below, use this compact indexing technique (`list[index][index]`) to get the "i" in "list" from `list_of_strings`.

In [36]:
#print(list_of_strings)
list_of_strings[3][1]

['this', 'is', 'a', 'list', 'of', 'strings']


'i'

---

#### A list of lists can be thought of as a table or matrix

Now, here's a really cool thing: we can think of the values in `multi_list` as being laid out in a table or *matrix* (like data in a spreadsheet) like this:

|  multi_list | Column 1  | Column 2  | Column 3  |
|:----------|:----------|:----------|:----------|
| **Row 1** |   1    |   2    |   3    |
| **Row 2** |   4    |   5    |   6    |
| **Row 3** |   7    |   8    |   9    |

in which every ***row*** is one of the lists in `multi_list`!

So now we can think of, say, this:

In [37]:
multi_list[1][2]

6

As specifing the *row* and *column indexes*, like  
`multi_list[row index = 1][column index = 2]`

In fact, if you want to think of a list of lists like a table or matrix, you can make this more obvious when you first make your object:

In [38]:
like_a_matrix = [[1, 1, 2],
                 [3, 5, 8],
                 [13, 21, 34]]
like_a_matrix

[[1, 1, 2], [3, 5, 8], [13, 21, 34]]

To print it like a matrix though, you'd have to get a little more cute with `print()`.

In [39]:
print(like_a_matrix[0], "\n", like_a_matrix[1], "\n", like_a_matrix[2], "\n")

[1, 1, 2] 
 [3, 5, 8] 
 [13, 21, 34] 



Here, the "\n" is how you tell `print()` to "hit return", that is, to start a "**n**"ew line.

### Python Tuple

Tuples are ordered collections of data. They are similar to lists but they are *immuatable*. Whereas you can add or change elements to a previously defined list, you cannot do that with tuples. Tuples are thus great for data that you don't anyone to mess with. So, for example, raw data from an experiment, being sacred, would go in a tuple. The results of calculations done on the data, however, would go in a list because you might want to change the calculations without having to make a new list every time.

Tuples are defined with parenthesis:

In [40]:
mytuple = (9,4,5)
print(mytuple)
type(mytuple)

(9, 4, 5)


tuple

#### Tuples are just like lists in some key ways

##### *Tuples can hold any other object (just like lists)*

In [41]:
mytuple2 = (4, 'four', 'IV', [1, 0, 0])
mytuple2

(4, 'four', 'IV', [1, 0, 0])

##### *Tuples are indexed just like lists*

In [43]:
mytuple[0]

9

In [None]:
mytuple2[3][0]

#### Tuples differ from lists in one key way

##### *Tuples are immutable*

Because tuples are immutable, we can't just change a value in a tuple if we wish.

So this will work just fine:

In [44]:
mylist = [10, 'ten', 'X', 1010]
mylist

[10, 'ten', 'X', 1010]

In [45]:
mylist[3] = "ten is 1010 in binary"
mylist[3]

'ten is 1010 in binary'

But this will not:

In [47]:
mytup = (10, 'ten', 'X', 1010)
mytup

(10, 'ten', 'X', 1010)

In [48]:
mytup[3] = "ten is 1010 in binary"  
mytup[3]

TypeError: 'tuple' object does not support item assignment

### Python Set

A set is defined as an unordered, unidexed and immutable collection of items. Whereas lists are defined by `[]`, and tuples are defined by `()`, sets are defined by `{}`.

In [49]:
myset = {"A", "B", "C", "D"}
print(myset)
type(myset)

{'A', 'B', 'D', 'C'}


set

Unordered means that items in the set do not have an assigned order, so they cannot be indexed. Back to our scrabble analogy, it doesn't make any sense to talk about the "third" tile in a scrabble bag of letters; the letters in the bag have no order, they're just jumbbled in a bag. Notice above that the elements of the set did *not* print in the order we used to make the set. 

We can demonstrate the lack of order for sets by testing two sets for equality:

In [50]:
{"A", "B", "C", "D"} == {"D", "C", "A", "B"}

True

So two sets are the same *as long as they contain the same elements*. For lists (and tuples) to be the same, however, the elements also have to be in the same order. So this is `False`: 

In [51]:
["A", "B", "C", "D"] == ["D", "C", "A", "B"]

False

Here's an interesting riddle about sets: What will the following give you? (Think about it before you run it.)

In [52]:
{"A", "B", "C", "D"} == {"D", "C", "A", "B", "D"}

True

What's going on here? By definition, each item in set is *unique*. If you try to specify duplicates in a set, they will be ignored:

In [53]:
myset2 = {"C", "K", "E", "D", "D"}
print(myset2)

{'E', 'K', 'D', 'C'}


So `myset2` contains only one "D".

Because sets are unorderd, it doesn't make any sense to try to index them. If you try to ask for the element of set at offset 1, you'll get an error:

In [54]:
myset[1]

TypeError: 'set' object is not subscriptable

This also means that a set is immutable and elements cannot be replaced:

In [55]:
myset[2] = "F"

TypeError: 'set' object does not support item assignment

You can make a set from a list.  Lets say, for example, people signed up for something on your organization's website, and their names were automatically stored in a list. But, for various reasons, some people signed up twice. 

In [57]:
# A real list would be much longer, but...
name_list = ["John", "Xie", "Julia", "Kat", "Ahmed", "John"]
name_list

['John', 'Xie', 'Julia', 'Kat', 'Ahmed', 'John']

In [58]:
# Convert list to set
name_set = set(name_list)
print(name_set) 

{'Ahmed', 'Kat', 'Julia', 'Xie', 'John'}


The duplicates are now removed! Also, compare the order of the list with the (arbitrary) order in which the members of set were printed.

Sets have some cool properties; they are the Python implementation of the "sets" you learned about in high school or college – remember Ven diagrams? – the overlapping circles with stuff in them? 

#### Set operations

Python has special operators for comparing sets: 

| operator  | action  |
|:----------:|:----------|
|   `\|`   |   union   |
|   `& `    |   intersection   |
|   `-`     |   difference   |
|   `^`     |   symmetric difference   |

Let's remind ourselves what `myset` and `myset2` contain:

In [59]:
print(myset, "\n", myset2)


{'A', 'B', 'D', 'C'} 
 {'E', 'K', 'D', 'C'}


Now we can check out (or remember for high school) what each of the set operators do.

##### *Union* - all elements in either set

In [60]:
myset | myset2

{'A', 'B', 'C', 'D', 'E', 'K'}

##### *Intersection* - only elements in *both* sets

In [61]:
myset & myset2

{'C', 'D'}

##### *Difference* - only elements in the first set but *not* the second set

In [66]:
myset - myset2
#print(myset2 - myset)

{'A', 'B'}
{'E', 'K'}


##### *Symmetric Difference* - only elements that are in one set but *not* the other

In [63]:
myset ^ myset2

{'A', 'B', 'E', 'K'}

You might not use `sets` that frequently, but don't forget about them! Many people have fallen into the trap of writing code to compare `lists` when the task could have easly been accomplished by just comparing `sets`.

### Python Dictionaries

Dictionaries are a very powerful container in Python. They allow you to bundle data together in a way that is easy and intuitive to access. When you think of a dictionary, you think of a list of words, each of which is accompanied by a definition. In computing, a "dictionary" is more general in that it is a list of words, called "keys" that are accompanied by data, or "values", associated with each key.  

This is best illustrated by example.

#### **Creating a Dictionary**

You can create a dictionary by placing comma-separated key-value pairs inside curly braces `{}`. Let's say we wanted to store stuff about various people at UT. A "dictionary" storing the information about one person, `UT_person_1` might look like this:

In [71]:
UT_person_1 = {
    'name': 'Matthew McConaughey',
    'age': 53,
    'born': 'Uvalde, Texas',
    'role': "Faculty"
}
print(UT_person_1)  

{'name': 'Matthew McConaughey', 'age': 53, 'born': 'Uvalde, Texas', 'role': 'Faculty'}


Now we have different bits of information about a person stored in a nice, organized package. Let's see how to retrieve the information.

#### **Accessing Dictionary Values**

To retrieve a value, refer to its key using square brackets `[]`.

In [72]:
print(UT_person_1['name'])  

Matthew McConaughey


Yes, we could have stored the same information in a `list`, but then we would have to access it using indexes, which means we'd have to remember which indexes went with what information.  Accessing using descriptive names, the `keys` is much more intuitive and easier to remember.

#### **Modifying a Dictionary**

You can add new key-value pairs or modify existing ones easily.

##### *Modifying an entry*

In [73]:
UT_person_1['role'] = 'Minister of Culture'
print(UT_person_1) 

{'name': 'Matthew McConaughey', 'age': 53, 'born': 'Uvalde, Texas', 'role': 'Minister of Culture'}


##### *Adding an entry*

You can also make a new key-value pair (or "entry") by just specifying it.

In [74]:
# Modifying an existing value
UT_person_1['first film'] = 'Dazed and Confused'
print(UT_person_1) 

{'name': 'Matthew McConaughey', 'age': 53, 'born': 'Uvalde, Texas', 'role': 'Minister of Culture', 'first film': 'Dazed and Confused'}


That was easy!

#### **Deleting Key-Value Pairs**

We can delete any entry using a `del` (`del`ete) statement.

In [75]:
del UT_person_1['age']
print(UT_person_1)  

{'name': 'Matthew McConaughey', 'born': 'Uvalde, Texas', 'role': 'Minister of Culture', 'first film': 'Dazed and Confused'}


#### **Checking for a Key**

To determine if a key is in a dictionary, whe can simply inquire using the `in` keyword.

In [76]:
'name' in UT_person_1

True

This can be read as "Is there an entry named "name" `in`` the dictionary UT_Person_1?" If so, the statement returns `True`. If not, it returns `False`:

In [77]:
'salary' in UT_person_1

False

## Summary

In this tutorial we have learned about Python objects called containers. These objects are very powerful, and becoming adept at working with them will help  build a good programming base. Moreover, getting comfortable with the basic Python objects will make learning to work with more data-science specific objects (such as Pandas data frames and NumPy arrays) much easier. In the next tutorial, we'll learn more about using and manipulating these objects.