# Python data types and structures

In the [Intro notebook](1_PY_Intro.ipynb), we introduced among other things the idea of variables, which we could use to store several different kinds of information. We could store text, numbers, or truth values. These different kinds of information correspond with different Python data `type`s.

We also briefly introduced the Python `list`, which can be used to store a collection of data.

Sometimes we will store information in individual variables, but often we will be working with several pieces of information that we want to group together because of their relationship or similarity. For example, if we were shopping for clothing, we could store each item we are going to buy in separate variables or we could store all of the items in one `list`.

In [4]:
clothing_a = 'shirts'
clothing_b = 'sandals'
clothing_c = 'shoes'
clothing_d = 'singlets'
clothing_e = 'boxers'

clothing_list = ['shirts', 'sandals', 'shoes', 'singlets', 'boxers']

Which of these approaches seems more useful to you? Let's write a short example function that "buys" each of the clothing we need.

In [3]:
def buy_clothing_individual(item_a, item_b, item_c, item_d, item_e):
    print('Buying %s...' % item_a)
    print('Buying %s...' % item_b)
    print('Buying %s...' % item_c)
    print('Buying %s...' % item_d)
    print('Buying %s...' % item_e)

def buy_clothing_list(items):
    for item in items:
        print('Buying %s...' % item)

In [5]:
buy_clothing_individual(clothing_a, clothing_b, clothing_c, clothing_d, clothing_e)        

Buying shirts...
Buying sandals...
Buying shoes...
Buying singlets...
Buying boxers...


In [6]:
buy_clothing_list(clothing_list)

Buying shirts...
Buying sandals...
Buying shoes...
Buying singlets...
Buying boxers...


By using a `list`, we could use a `for` loop to write a much shorter function (in The Zen of Python, by Tim Peters, the 3rd commandment is: _"Simple is better than complex"_. You can run `import this` to see them). But even more important, `buy_clothing_list` is much more flexible. What if instead of buying five items, we wanted to buy more or less?

In [7]:
# let's try to buy just three items
buy_clothing_individual(clothing_a, clothing_b, clothing_c)        

TypeError: buy_clothing_individual() missing 2 required positional arguments: 'item_d' and 'item_e'

In [8]:
# or a sixth item
clothing_f = 'trousers'

buy_clothing_individual(clothing_a, clothing_b, clothing_c, clothing_d, clothing_e, clothing_f)        

TypeError: buy_clothing_individual() takes 5 positional arguments but 6 were given

We encounter an error when we try to use `buy_clothing_individual` because it is expecting "exactly 5 arguments." We don't run into that problem with `buy_clothing_list`, because our `for` loop can work with lists of any length.

In [9]:
short_clothing_list = ['shoes', 'singlets', 'boxers']
buy_clothing_list(short_clothing_list)

Buying shoes...
Buying singlets...
Buying boxers...


### Data Structure Intro Exercises

Do a corresponding `long_clothing_list` and pass it to the `buy_clothing_list` function.

## `lists` of Power

Python has a number of compound data types, used to group together other values. The ***most versatile*** is the `list`, which can be written as a _list of comma-separated values (items) between square brackets_ `[]`. Lists _might_ contain items of *different types*, but usually the items all have the same type.

>Like strings (and all other built-in sequence types), lists can be indexed and sliced

In [2]:
names = ['Ebuka', 'Ekene', 'James', 'Chukwdi', 'Ralu', 'Bosah', 'Dogo', 'Olumide', 'Emeka', 'Tochukwu', 'Nnamdi']
# this is just a list of strings
names

['Ebuka',
 'Ekene',
 'James',
 'Chukwdi',
 'Ralu',
 'Bosah',
 'Dogo',
 'Olumide',
 'Emeka',
 'Tochukwu',
 'Nnamdi']

In [11]:
names[4]   # indexing returns the item

'Ralu'

In [12]:
names[2:6]       # slicing returns a new list

['James', 'Chukwdi', 'Ralu', 'Bosah']

Lists also support operations like concatenation just like strings. However, unlike strings, which are ***immutable***, `list`s are a ***mutable*** type, i.e. it is possible to change their content. 

In [13]:
names += ['Arinze', 'Arthur']
names

['Ebuka',
 'Ekene',
 'James',
 'Chukwdi',
 'Ralu',
 'Bosah',
 'Dogo',
 'Olumide',
 'Emeka',
 'Tochukwu',
 'Nnamdi',
 'Arinze',
 'Arthur']

### List Methods

There are about 11 direct list methods. However we will only consider few of them. Say 3. Kindly look up on the others.

In [12]:
names.append('Ben')    # .append is used to add new items to the list, takes only 1 argument
names

['Ebuka',
 'Ekene',
 'James',
 'Chukwdi',
 'Ralu',
 'Bosah',
 'Dogo',
 'Olumide',
 'Emeka',
 'Tochukwu',
 'Nnamdi',
 'Ben']

In [15]:
names.append('Ben', 'Chinedu')    # .append takes only 1 argument

TypeError: append() takes exactly one argument (2 given)

In [16]:
# .pop is used to remove (usually) d last item in a list, usually takes no argument.
names.pop()    
names

['Ebuka',
 'Ekene',
 'James',
 'Chukwdi',
 'Ralu',
 'Bosah',
 'Dogo',
 'Olumide',
 'Emeka',
 'Tochukwu',
 'Nnamdi',
 'Arinze',
 'Arthur']

In [5]:
# however, you could supply a negative based index to argument to remeve that value
names.pop(-8)    
names

['Ebuka',
 'Ekene',
 'James',
 'Chukwdi',
 'Bosah',
 'Olumide',
 'Emeka',
 'Tochukwu',
 'Nnamdi',
 'Ben',
 'Chinedu']

In [7]:
names.remove('Ben')     # remove first occurrence of value, usually takes one argument
names

['Ebuka',
 'Ekene',
 'James',
 'Chukwdi',
 'Bosah',
 'Olumide',
 'Emeka',
 'Tochukwu',
 'Nnamdi',
 'Chinedu']

In [21]:
names.remove('Nnamdi', 'Tochukwu')     # remove first occurrence of value, usually takes one argument
names

TypeError: remove() takes exactly one argument (2 given)

### List Exercises 1

Find out what the following list method does:

1. `clear`
2. `copy`
3. `count`
4. `extend`
5. `index`
6. `insert`
7. `reverse`
8. `sort`

In Jupyter env you could run `names.clear?` to get a doc on the command.

So far we've worked with `list`s of strings, but we're not limited to only that data type. A list can accept any datatype as an input. It is highly versertile!

In [22]:
int_list = [2, 6, 3049, 18, 37]                          # list of ints
float_list = [3.7, 8.2, 178.245, 63.1]                   # list of floats
mixed_list = [26, False, 'some words', 1.264, 3/16]      # weird list of mixed stuff

print(int_list)
print(float_list)
print(mixed_list)

[2, 6, 3049, 18, 37]
[3.7, 8.2, 178.245, 63.1]
[26, False, 'some words', 1.264, 0.1875]


We can store any `type` of data in a `list`. We can even put a `list` inside of a `list`. There are very few restrictions on how we structure a `list` or what we put in it. This can lead to a very complicated nested structure.

In [None]:
list_of_lists = [['a', 'list', 'of', 'words'], [1, 5, 209], [True, True, False]]
print(list_of_lists)

In [23]:
#never create stuff like this
confusing_list = [[23, 73, 50], 'some words', 12.308, [[False, True], 'more words', False], 
                  {'this': 'is', 'not': 'ok', 'noted': True}]
print(confusing_list)

[[23, 73, 50], 'some words', 12.308, [[False, True], 'more words', False], {'this': 'is', 'not': 'ok', 'noted': True}]


In [24]:
print(type(confusing_list[0]))
print(type(confusing_list[1]))
print(type(confusing_list[2]))
print(type(confusing_list[3]))
print(type(confusing_list[4]))

print(type(confusing_list[3][0]))
print(type(confusing_list[3][1]))
print(type(confusing_list[3][2]))

<class 'list'>
<class 'str'>
<class 'float'>
<class 'list'>
<class 'dict'>
<class 'list'>
<class 'str'>
<class 'bool'>


We describe the Python `list` as _heterogeneous_ because it can hold a collection of mixed objects. This is one of the major defining properties of the Python `list`.

You may have also noticed that when we put data into a `list` in particular order, it stays in that order when we `print` or use the `list` in a `for` loop. Because `list` preserves order, we say it is _ordered_. We can use this property to retrieve particular items from a `list` based on their position (or **index**) in the list.

### List Exercises 2

1. In your own words, what are some of the properties of a Python `list`?
2. Make a list of 10 elements and select only the last 2 elements
3. Take that same list of 10 elements and select every other element starting with the very first element.
4. Select every other element starting with the second element.

## `tuple` the Government?

Another standard sequence data type is the `tuple`.

>Like strings (and all other built-in sequence types), `tuple`s can be indexed and sliced.

A `tuple` consists of a number of values separated by commas, for instance.

In [2]:
t = 'otu', 1, 'abuou', 2, 'ato', 3, False
t

('otu', 1, 'abuou', 2, 'ato', 3, False)

In [26]:
# Tuples may be indexed:
t[4]

'ato'

In [27]:
# Tuples may be sliced:
t[4:]

('ato', 3, False)

In [28]:
# Tuples may be nested
t_nested = t, (1, 2, 3, 4, 5), t
t_nested

(('otu', 1, 'abuou', 2, 'ato', 3, False),
 (1, 2, 3, 4, 5),
 ('otu', 1, 'abuou', 2, 'ato', 3, False))

In [13]:
# Tuples are immutable
t[2] = 'abou'

TypeError: 'tuple' object does not support item assignment

In [16]:
# deletion also fails
del t[-1]

TypeError: 'tuple' object doesn't support item deletion

Both the Python `list` and `tuple` are ordered and heterogeneous. However, unlike the `list`, the `tuple` is immutable, meaning it cannot be modified after it is created. Therefore, a `list` might be better for representing data that is expected to change over the course of a program, like a to-do list. A `tuple` might be better for representing data that is expected to be fixed, like the responses of an individual subject to a survey.

#### Gotcha

One common mistake people make with immutability and especially with tuples is to assume data structures inside the tuple are immutable because the tuple is immutable.  Lets see an example.

In [1]:
# tuples can contain mutable objects
t_mutated = [1, 2, 3], [4, 5, 6], [7, 8, 9]      # you can be explicit and use ()
t_mutated

([1, 2, 3], [4, 5, 6], [7, 8, 9])

In [33]:
# which could be mutated
print(t_mutated)
t_mutated[2][2] = 0
print(t_mutated)

([1, 2, 3], [4, 5, 6], [7, 8, 0])
([1, 2, 3], [4, 5, 6], [7, 8, 0])


In [3]:
t_mutated[2] = [2,4,5]
print(t_mutated)

TypeError: 'tuple' object does not support item assignment

Even though the `tuple` itself is immutable, we cannot change the exact objects which it contains, those objects themselves can be mutated if they are mutable.  As with anywhere mutability shows up, this requires the programmer to be careful and not assume data has not been modified in some context.

### Tuple Exercises 

Create a `tuple` object containing:
1. Exactly 0 item
2. Exactly 1 item

The statement `'otu', 1, 'abuou', 2, 'ato', 3, False` is an example of ***tuple packing***. The values 'otu', 1, 'abuou', 2, 'ato', 3, and False are packed together i
n a `tuple`. The reverse operation, ***sequence unpacking*** is also possible.


In [3]:
a, b, c, d, e, f, g = t
print(a, type(a))
print(b, type(b))
print(c, type(c))

# get the value and type of the others

otu <class 'str'>
1 <class 'int'>
abuou <class 'str'>


The implicit use of a `tuple` comes up most often when working with functions that return multiple outputs. For example, we might have a function that returns the first and last letter of a string.

In [34]:
def first_last(s):
    return s[0], s[-1]

chars = first_last('hello!')
print(chars)

('h', '!')


In [35]:
first_char, last_char = first_last('hello!')

print(first_char)
print(last_char)

h
!


## `set` to Make a Difference

Python also includes a data type for `set`s. A `set` is an ***unordered*** _collection_ with ***no duplicate elements***. Basic uses include membership testing and eliminating duplicate entries. `set` objects also support mathematical operations like union, intersection, difference, and symmetric difference.

`set`s can store heterogeneous data and it is mutable.

In [4]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}      # use {} or set()
print(basket)            # show that duplicates have been removed

{'apple', 'banana', 'orange', 'pear'}


Even though we entered the data in one order, the `set` printed out in a different order. Even more significantly, we cannot index or slice a `set`. ___You may try to see the output___.

In [19]:
'orange' in basket                 # fast membership testing

True

In [20]:
'wife' in basket

False

In [21]:
a = set('abracadabra')
b = set('alacazam')

In [22]:
a               # unique letters in a

{'a', 'b', 'c', 'd', 'r'}

In [23]:
b               # unique letters in b

{'a', 'c', 'l', 'm', 'z'}

In [24]:
a - b                              # letters in a but not in b

{'b', 'd', 'r'}

In [25]:
a | b                              # letters in a or b or both

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [26]:
a & b                              # letters in both a and b

{'a', 'c'}

In [27]:
a ^ b                              # letters in a or b but not both

{'b', 'd', 'l', 'm', 'r', 'z'}

### Set Exercises 

1. Create a `set` containing 5 or more items
2. Perform any 3 different actions on the set object using any of the `set` methods

_**Why is `set` useful?**_

It seems strange that we might want an _unordered_ data structure. We can't access or modify the data through indexing. How does giving up order benefit us? The answer is that it gives us flexibility about how the data is stored in memory, and that flexibility can make data retrieval much faster.

This is also related to hashing technique.

Because searching for data is very simple in a `set`, they are also very useful for making comparisons between collections of data.

In [1]:
student_a_courses = {'history', 'english', 'biology', 'theatre'}
student_b_courses = {'biology', 'english', 'mathematics', 'computer science'}

print(student_a_courses.intersection(student_b_courses))
print(student_a_courses.union(student_b_courses))
print(student_a_courses.difference(student_b_courses))
print(student_b_courses.difference(student_a_courses))
print(student_a_courses.symmetric_difference(student_b_courses))

{'biology', 'english'}
{'biology', 'history', 'computer science', 'english', 'theatre', 'mathematics'}
{'history', 'theatre'}
{'mathematics', 'computer science'}
{'history', 'computer science', 'theatre', 'mathematics'}


In [2]:
student_a_courses & student_b_courses 

{'biology', 'english'}

In [3]:
student_a_courses ^ student_b_courses 

{'computer science', 'history', 'mathematics', 'theatre'}

In [4]:
student_a_courses - student_b_courses

{'history', 'theatre'}

In [5]:
student_a_courses | student_b_courses

{'biology', 'computer science', 'english', 'history', 'mathematics', 'theatre'}

## Search a `dict`

Another useful data type built into Python is the dictionary or `dict` for short. Dictionaries are sometimes found in other languages as ***“associative memories” or “associative arrays”***. 

Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any ***immutable type***.

1. Strings and numbers can always be keys. 
2. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key.
3. Lists can’t be used as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().

In [7]:
# common way to define dicts
staff_rank = {'dogo': 'H16-S32', 'ebuka': 'H16-S31', 'olamide': 'H16-S30', 'nnaemeka': 'H6-S12'} 
staff_rank

{'dogo': 'H16-S32',
 'ebuka': 'H16-S31',
 'olamide': 'H16-S30',
 'nnaemeka': 'H6-S12'}

In [1]:
# another way to init dicts if you have SIMPLE strings i.e. strings with no whitespaces
staff_married = dict(dogo=True, ebuka=False, olamide=True, nnaemeka=None)   
staff_married

{'dogo': True, 'ebuka': False, 'olamide': True, 'nnaemeka': None}

In [2]:
print(staff_married.values())
print(staff_married.keys())
# find out other methods of the dict class

dict_values([True, False, True, None])
dict_keys(['dogo', 'ebuka', 'olamide', 'nnaemeka'])


It is best to think of a dictionary as a `set` of `key: value` pairs, with the requirement that the keys are unique (within one dictionary). 

A pair of braces creates an empty dictionary: {}.

In [3]:
dt = {}
print(type(dt))

<class 'dict'>


The main operations on a `dict` are storing a value with some key and extracting the value given the key.

In [8]:
staff_rank['dogo']    # view a value given the key

'H16-S32'

In [11]:
print(staff_rank)
staff_rank['tochukwu'] = 'H16-S22'    # add a key:value pair
print(staff_rank)

{'dogo': 'H16-S32', 'ebuka': 'H16-S31', 'olamide': 'H16-S30', 'nnaemeka': 'H6-S12'}
{'dogo': 'H16-S32', 'ebuka': 'H16-S31', 'olamide': 'H16-S30', 'nnaemeka': 'H6-S12', 'tochukwu': 'H16-S22'}


In [12]:
staff_rank['emeka', 'arinze'] = 'H16-S24', 'H16-S25'    # add a key:value pair, note that the key is a tuple
staff_rank

{'dogo': 'H16-S32',
 'ebuka': 'H16-S31',
 'olamide': 'H16-S30',
 'nnaemeka': 'H6-S12',
 'tochukwu': 'H16-S22',
 ('emeka', 'arinze'): ('H16-S24', 'H16-S25')}

In [17]:
staff_rank[['emeka_2', 'arinze_2']] = 'H16-S24', 'H16-S25'    # add a key:value pair, error on list

{'dogo': 'H16-S32',
 'ebuka': 'H16-S31',
 'olamide': 'H16-S30',
 'nnaemeka': 'H6-S12',
 'tochukwu': 'H16-S22',
 ('emeka', 'arinze'): ('H16-S24', 'H16-S25'),
 ('emeka_2', 'arinze_2'): ('H16-S24', 'H16-S25')}

It is also possible to delete a ***key:value*** pair with `del`.

If you store a value using a key that is already in use, the old value associated with that key is forgotten. 

> It is an error to extract a value using a non-existent key.

In [18]:
print(staff_rank)
del staff_rank['emeka', 'arinze']        # note that 'emeka', 'arinze' is actually a tuple 
                                         # we could make it clearer by doing ('emeka', 'arinze')
print(staff_rank)    

{'dogo': 'H16-S32', 'ebuka': 'H16-S31', 'olamide': 'H16-S30', 'nnaemeka': 'H6-S12', 'tochukwu': 'H16-S22', ('emeka', 'arinze'): ('H16-S24', 'H16-S25'), ('emeka_2', 'arinze_2'): ('H16-S24', 'H16-S25')}
{'dogo': 'H16-S32', 'ebuka': 'H16-S31', 'olamide': 'H16-S30', 'nnaemeka': 'H6-S12', 'tochukwu': 'H16-S22', ('emeka_2', 'arinze_2'): ('H16-S24', 'H16-S25')}


In [2]:
print(staff_married)
staff_married['olamide'] = False
print(staff_married)

{'dogo': True, 'ebuka': False, 'olamide': True, 'nnaemeka': None}
{'dogo': True, 'ebuka': False, 'olamide': False, 'nnaemeka': None}


In [20]:
print(staff_married)
staff_married['olumide']

{'dogo': True, 'ebuka': False, 'olamide': False, 'nnaemeka': None}


KeyError: 'olumide'

To check whether a single key is in the `dict`, use the `in` keyword

In [14]:
'olumide' in staff_married

False

## More on `dict`s

### `zip`

The `zip` function can be very handy for creating a `dict`. Say we have 2 `list`s containing logically related data, we could `zip` then into a `tuple` then into a `dict`.

In [3]:
value_list = ['Arthur', 32, 177.5, 68.5, 'black', 'brown', False, True, 'H16-S35']
key_list = ['name', 'age', 'height', 'weight', 'hair', 'eyes', 'has dog', 'married', 'level']    # problem with has dog?

print(value_list)
print(key_list)

['Arthur', 32, 177.5, 68.5, 'black', 'brown', False, True, 'H16-S35']
['name', 'age', 'height', 'weight', 'hair', 'eyes', 'has dog', 'married', 'level']


In [24]:
key_value_pairs = dict(zip(key_list, value_list))

print(key_value_pairs)
print(type(key_value_pairs))

{'name': 'Arthur', 'age': 32, 'height': 177.5, 'weight': 68.5, 'hair': 'black', 'eyes': 'brown', 'has dog': False, 'married': True, 'level': 'H16-S35'}
<class 'dict'>


### Basic Data Structures Summary

|Data Structure | Ordered | Mutable | Indexable | Slicable|
|---|---|---|---|---|
|__list__ | True | True | True | True|
|__tuple__ | True | False | True | True|
|__set__ | False | False | False | False|
|__dict__ | _ | False | False | False|


*Copyright &copy; 2019 MICTU, UNIZIK.*
*Adapted for use in MICTU, UNIZIK Python Classes by [Arthur Ezenwanne](https://github.com/ArthurEzenwanne). All rights freely un-reserved.*